z

    Zach Angell

    2 years ago
    Is there a good way to handle situations where a Prefect agent may shut down during flow execution? I'm looking into using EC2 spot instances as Prefect agents. These instances can theoretically be shut down without warning. Ideally, any Flows that were running would be aware that something has disrupted execution, and signal that the flow needs to be re-run. If I try killing local agent process (Ctrl + C while the agent is mid-flow execution), it seems like tasks hang in the
    Submitted
    state by default.
    Jim Crist-Harif

    Jim Crist-Harif

    2 years ago
    Hi Zach, prefect doesn't rely on the agent running during the whole flow execution - agents are for kicking off flow runs, once the agent has kicked off a flow run to whatever backend it's using (e.g. EC2, local processes, etc...) the agent no longer tracks execution - the flow run is its own separate process.
    If a flow run dies mid execution, prefect cloud/server will notice and reschedule it depending on your flow configuration.
    The behavior you're seeing with the local agent is interesting, this might be a peculiarity of the local agent, and potentially a bug. Agents that don't run flows in local processes definitely won't have this issue, but I'm not sure if the local agent kills child processes intentionally on exit or not. Let me take a quick look.
    Yeah, that's weird and unintentional. What OS are you running on? Also, are you in a docker container?
    z

    Zach Angell

    2 years ago
    macOS
    10.14.6
    I don't think it's running from inside a docker container, but it is running inside nested bash shells for
    pipenv
    and
    aws-vault
    Jim Crist-Harif

    Jim Crist-Harif

    2 years ago
    Ok, I can take a look into this. Thanks for the report. In the meantime, your main goal was to run the agent on spot ec2 instances - did you also want the flow runs to run on these same instances, or were you planning on using the fargate agent to launch as separate fargate tasks?
    The latter will work fine if the agent dies - the flow runs will continue to run. If the former, the flow runs will also die (since they're on the same instance), but cloud/server will eventually notice they've stopped running and will restart them as needed.
    z

    Zach Angell

    2 years ago
    hmm okay let me think about what makes sense there
    If a flow run dies mid execution, prefect cloud/server will notice and reschedule it depending on your flow configuration.
    re: the above Would the flow configuration be defining
    max_retries
    and
    retry_delay
    for tasks in the flow?
    Jim Crist-Harif

    Jim Crist-Harif

    2 years ago
    Provided you don't have heartbeats disabled in your cloud tenant settings, a flow run will always be restarted once cloud notices missing heartbeats. Task-level retry limits will still be applied, so if a task has no retries left it won't be restarted.
    Oh, also, right now there is a fixed limit of 3 restarts for a flow run that dies mid execution. We might make that configurable in the future.
    z

    Zach Angell

    2 years ago
    👍