How to track reruns? We've been calling `client.c...
# prefect-server
j
How to track reruns? We've been calling
client.create_flow_run()
to create several flow runs (using ECSRun), and then polling each with
client.get_flow_run_state
to know when all the flows have completed. When one of the flows fails (and prefect starts a new flow run to take its place), how can we check when the rerun is complete (and whether it succeeded)?
k
A bit confused why a new flow run will take its place? But I think using the
StartFlowRun
task will give you want you want because it `raise`s the end state of the flow is you set
wait=True
And then you can just use it like
StartFlowRun(…).run(...)
outside of a flow
j
Updated question to clarify the number of flows being run. Guessing we don't want to use wait=True since it's multiple flow runs.
k
Ok that makes sense. I am still a bit confused what is starting a new flow run? Do you check for failure and then restart it? Is Prefect automatically restarting that (I don’t think we do)?
j
We check for failure but we do not manually restart. Let me verify what we're seeing here.
Here are the highlights of the prefect logs: 12:09pm (Some normal log output from the flow) 12:12pm No heartbeat detected from the remote task; marking the run as failed. 12:28pm Rescheduled by a Lazarus process. This is attempt 1. 12:28pm Submitted for execution: Task arnawsecs: 12:29pm Beginning Flow run for xxx 12:29pm Flow run FAILED: some reference tasks failed.
Here is the GUI timeline
k
Got it. That really helps. So Lazarus kicks in if the Flow can’t find the underlying compute to execute (Kubernetes or in this case ECS). Lazarus will re-submit the flow. Now to your question on how to get state. Basically you will need to use the GraphQL API I think. And then you can search by flow_id or by name and project, get the latest flow run, and then check the state.
You can query with something like this (though this is for starting Flow Runs). The point it to use the
client.graphql()
method with your query to pull the info
j
Reading the lazarus docs, it says lazarus runs once every 10 minutes. Would it be easier for us to disable lazarus for these flow runs, and then we could create a new flow run as soon as one is noticed to have failed?
k
If that works for you, yep you can do that. Do you know how to disable Lazarus?
j
I don't know how to disable Lazarus
k
You go to the Flow settings and then there is a toggle to turn it off
j
Can it be disabled from the ECSRun parameters?
Found the settings
k
I think the UI is easier. Yeah you can disable it there
👍 1