How to track reruns? We've been calling `client.c...
# prefect-server
How to track reruns? We've been calling
to create several flow runs (using ECSRun), and then polling each with
to know when all the flows have completed. When one of the flows fails (and prefect starts a new flow run to take its place), how can we check when the rerun is complete (and whether it succeeded)?
A bit confused why a new flow run will take its place? But I think using the
task will give you want you want because it `raise`s the end state of the flow is you set
And then you can just use it like
outside of a flow
Updated question to clarify the number of flows being run. Guessing we don't want to use wait=True since it's multiple flow runs.
Ok that makes sense. I am still a bit confused what is starting a new flow run? Do you check for failure and then restart it? Is Prefect automatically restarting that (I don’t think we do)?
We check for failure but we do not manually restart. Let me verify what we're seeing here.
Here are the highlights of the prefect logs: 12:09pm (Some normal log output from the flow) 12:12pm No heartbeat detected from the remote task; marking the run as failed. 12:28pm Rescheduled by a Lazarus process. This is attempt 1. 12:28pm Submitted for execution: Task arnawsecs: 12:29pm Beginning Flow run for xxx 12:29pm Flow run FAILED: some reference tasks failed.
Here is the GUI timeline
Got it. That really helps. So Lazarus kicks in if the Flow can’t find the underlying compute to execute (Kubernetes or in this case ECS). Lazarus will re-submit the flow. Now to your question on how to get state. Basically you will need to use the GraphQL API I think. And then you can search by flow_id or by name and project, get the latest flow run, and then check the state.
You can query with something like this (though this is for starting Flow Runs). The point it to use the
method with your query to pull the info
Reading the lazarus docs, it says lazarus runs once every 10 minutes. Would it be easier for us to disable lazarus for these flow runs, and then we could create a new flow run as soon as one is noticed to have failed?
If that works for you, yep you can do that. Do you know how to disable Lazarus?
I don't know how to disable Lazarus
You go to the Flow settings and then there is a toggle to turn it off
Can it be disabled from the ECSRun parameters?
Found the settings
I think the UI is easier. Yeah you can disable it there
👍 1