Thread
#prefect-server
    jack

    jack

    9 months ago
    How to track reruns? We've been calling
    client.create_flow_run()
    to create several flow runs (using ECSRun), and then polling each with
    client.get_flow_run_state
    to know when all the flows have completed. When one of the flows fails (and prefect starts a new flow run to take its place), how can we check when the rerun is complete (and whether it succeeded)?
    Kevin Kho

    Kevin Kho

    9 months ago
    A bit confused why a new flow run will take its place? But I think using the
    StartFlowRun
    task will give you want you want because it raises the end state of the flow is you set
    wait=True
    And then you can just use it like
    StartFlowRun(…).run(...)
    outside of a flow
    jack

    jack

    9 months ago
    Updated question to clarify the number of flows being run. Guessing we don't want to use wait=True since it's multiple flow runs.
    Kevin Kho

    Kevin Kho

    9 months ago
    Ok that makes sense. I am still a bit confused what is starting a new flow run? Do you check for failure and then restart it? Is Prefect automatically restarting that (I don’t think we do)?
    jack

    jack

    9 months ago
    We check for failure but we do not manually restart. Let me verify what we're seeing here.
    Here are the highlights of the prefect logs: 12:09pm (Some normal log output from the flow) 12:12pm No heartbeat detected from the remote task; marking the run as failed. 12:28pm Rescheduled by a Lazarus process. This is attempt 1. 12:28pm Submitted for execution: Task arn:aws:ecs: 12:29pm Beginning Flow run for xxx 12:29pm Flow run FAILED: some reference tasks failed.
    Here is the GUI timeline
    Kevin Kho

    Kevin Kho

    9 months ago
    Got it. That really helps. So Lazarus kicks in if the Flow can’t find the underlying compute to execute (Kubernetes or in this case ECS). Lazarus will re-submit the flow. Now to your question on how to get state. Basically you will need to use the GraphQL API I think. And then you can search by flow_id or by name and project, get the latest flow run, and then check the state.
    You can query with something like this (though this is for starting Flow Runs). The point it to use the
    client.graphql()
    method with your query to pull the info
    jack

    jack

    9 months ago
    Reading the lazarus docs, it says lazarus runs once every 10 minutes. Would it be easier for us to disable lazarus for these flow runs, and then we could create a new flow run as soon as one is noticed to have failed?
    Kevin Kho

    Kevin Kho

    9 months ago
    If that works for you, yep you can do that. Do you know how to disable Lazarus?
    jack

    jack

    9 months ago
    I don't know how to disable Lazarus
    Kevin Kho

    Kevin Kho

    9 months ago
    You go to the Flow settings and then there is a toggle to turn it off
    jack

    jack

    9 months ago
    Can it be disabled from the ECSRun parameters?
    Found the settings
    Kevin Kho

    Kevin Kho

    9 months ago
    I think the UI is easier. Yeah you can disable it there