Hi all we re run into an issue a few times and I m curious i Prefect Community #ask-community

Hi all, we’re run into an issue a few times and I’...

Brian Mesick

10/29/2020, 5:35 PM

Hi all, we’re run into an issue a few times and I’m curious if anyone can help me figure out where we are going wrong. We’ve deployed a few flows to Prefect Cloud, but whenever we try to add RETRY logic using an S3Result the container seems to fail to start, gets Lazarus kicked 3 times and dies with no other logs.

Brian Mesick

10/29/2020, 5:37 PM

We had a problem up to yesterday where the Resource Manager was getting 403's trying to fetch the logs, and fixed that. Now instead we get a log line on only the first attempt:

unable to retrieve container logs for <docker://3c175d18c55f5a413fac5f71cc07c98f89d868f9a856416366>…….

nicholas

10/29/2020, 5:48 PM

Hi @Brian Mesick - which version of Prefect are you using?

Brian Mesick

10/29/2020, 5:51 PM

The flow should be on 0.11.5

Brian Mesick

10/29/2020, 5:52 PM

The agent image looks like 0.13.10

nicholas

10/29/2020, 5:56 PM

Got it, thanks @Brian Mesick - the resource manager has been replaced with an integrated resource maintenance process that should perform a lot better - could you try removing references to the resource manager however you're spinning up your agent? This'll help us debug a bit better.

Brian Mesick

10/29/2020, 6:05 PM

Thanks @nicholas . So if I turn off the resource manager 0.13.10 will be able to reap old pods for us without it?

Brian Mesick

10/29/2020, 6:27 PM

Looks like that merged with 0.13.5, so I’m moving forward with removing references to it now. We’ve done this to debug issues in the past, but it’s a pretty clunk process to remove / add every time we run into this.

nicholas

10/29/2020, 6:28 PM

You should be fine without it @Brian Mesick but let me know if you run into any issues

👍 1

Brian Mesick

10/29/2020, 9:11 PM

@nicholas Ok, I redeployed the agent without the resource manager. Nothing new is showing up in the logs, just “submitted for execution” and “rescheduled by a lazarus process”. No new pod is showing up for the flow run, either. Or it’s disappearing before I can see it.

nicholas

10/29/2020, 9:17 PM

Hm ok - this sounds like something that will take more triaging by the Core team; would you mind opening a ticket on the Core repo with details about your set up?

Brian Mesick

10/29/2020, 10:04 PM

Sure thing, thanks

nicholas

10/29/2020, 10:06 PM

Thank you!

7 Views

Open in Slack

Previous Next