Hi all, we’re run into an issue a few times and I’...
# prefect-community
b
Hi all, we’re run into an issue a few times and I’m curious if anyone can help me figure out where we are going wrong. We’ve deployed a few flows to Prefect Cloud, but whenever we try to add RETRY logic using an S3Result the container seems to fail to start, gets Lazarus kicked 3 times and dies with no other logs.
We had a problem up to yesterday where the Resource Manager was getting 403's trying to fetch the logs, and fixed that. Now instead we get a log line on only the first attempt:
unable to retrieve container logs for <docker://3c175d18c55f5a413fac5f71cc07c98f89d868f9a856416366>…….
n
Hi @Brian Mesick - which version of Prefect are you using?
b
The flow should be on 0.11.5
The agent image looks like 0.13.10
n
Got it, thanks @Brian Mesick - the resource manager has been replaced with an integrated resource maintenance process that should perform a lot better - could you try removing references to the resource manager however you're spinning up your agent? This'll help us debug a bit better.
b
Thanks @nicholas . So if I turn off the resource manager 0.13.10 will be able to reap old pods for us without it?
Looks like that merged with 0.13.5, so I’m moving forward with removing references to it now. We’ve done this to debug issues in the past, but it’s a pretty clunk process to remove / add every time we run into this.
n
You should be fine without it @Brian Mesick but let me know if you run into any issues
👍 1
b
@nicholas Ok, I redeployed the agent without the resource manager. Nothing new is showing up in the logs, just “submitted for execution” and “rescheduled by a lazarus process”. No new pod is showing up for the flow run, either. Or it’s disappearing before I can see it.
n
Hm ok - this sounds like something that will take more triaging by the Core team; would you mind opening a ticket on the Core repo with details about your set up?
b
Sure thing, thanks
n
Thank you!