b

    Brian Mesick

    1 year ago
    Hi all, we’re run into an issue a few times and I’m curious if anyone can help me figure out where we are going wrong. We’ve deployed a few flows to Prefect Cloud, but whenever we try to add RETRY logic using an S3Result the container seems to fail to start, gets Lazarus kicked 3 times and dies with no other logs.
    We had a problem up to yesterday where the Resource Manager was getting 403's trying to fetch the logs, and fixed that. Now instead we get a log line on only the first attempt:
    unable to retrieve container logs for <docker://3c175d18c55f5a413fac5f71cc07c98f89d868f9a856416366>…….
    nicholas

    nicholas

    1 year ago
    Hi @Brian Mesick - which version of Prefect are you using?
    b

    Brian Mesick

    1 year ago
    The flow should be on 0.11.5
    The agent image looks like 0.13.10
    nicholas

    nicholas

    1 year ago
    Got it, thanks @Brian Mesick - the resource manager has been replaced with an integrated resource maintenance process that should perform a lot better - could you try removing references to the resource manager however you're spinning up your agent? This'll help us debug a bit better.
    b

    Brian Mesick

    1 year ago
    Thanks @nicholas . So if I turn off the resource manager 0.13.10 will be able to reap old pods for us without it?
    Looks like that merged with 0.13.5, so I’m moving forward with removing references to it now. We’ve done this to debug issues in the past, but it’s a pretty clunk process to remove / add every time we run into this.
    nicholas

    nicholas

    1 year ago
    You should be fine without it @Brian Mesick but let me know if you run into any issues
    b

    Brian Mesick

    1 year ago
    @nicholas Ok, I redeployed the agent without the resource manager. Nothing new is showing up in the logs, just “submitted for execution” and “rescheduled by a lazarus process”. No new pod is showing up for the flow run, either. Or it’s disappearing before I can see it.
    nicholas

    nicholas

    1 year ago
    Hm ok - this sounds like something that will take more triaging by the Core team; would you mind opening a ticket on the Core repo with details about your set up?
    b

    Brian Mesick

    1 year ago
    Sure thing, thanks
    nicholas

    nicholas

    1 year ago
    Thank you!