Is there a way to do retries on entire scheduled flows Retri Prefect Community #ask-community

Is there a way to do retries on entire/scheduled f...

Henning Holgersen

09/26/2021, 11:14 AM

Is there a way to do retries on entire/scheduled flows ? Retries on tasks work well, but I have had a flow fail because the executor is a remote cluster - so it doesn’t even reach the Task stage. In my case, a coiled cluster failed to spin up for some reason (actually first time that happened).

Kevin Kho

09/26/2021, 8:35 PM

Hey @Henning Holgersen, the issue we’ve had so far is the flow level retries mean different things to different people. Did the Lazarus process kick in for your logs? In cases where you don’t get infrastructure, I think the Lazarus process should attempt to retry

Henning Holgersen

09/28/2021, 7:07 AM

No trace of Lazarus, the logs note a

coiled.errors.ServerError: Could not launch scheduler for dask cluster

error. Coiled dashboard shows signs of a cluster at that time, but not actually running. So it’s consistent like that…

Kevin Kho

09/28/2021, 2:04 PM

Gotcha. Actually I bumped into this ServerError myself yesterday. Will ask the team what ideas they have.

Kevin Kho

09/28/2021, 3:13 PM

Ok so this situation is a bit tricky to get to restart automatically. This is because doing a blind restart with a state handler upon failure would also apply when the Flow fails due to data errors, which could cause an infinite loop. You would need something like this. 1. Create a record in the KV Store and set it to

true

2. Make the first task of the flow set the flag in the KV Store to false. This is your indication that the executor came up successfully 3. If the flow fails and the KV Store flag is still

true

, this shows the executor didn’t start and then you can either

create_flow_run

to kick it off again or

set_flow_state

to change it from Failed to Scheduled to run again

2 Views

Open in Slack

Previous Next