https://prefect.io logo
r

Robert Banick

07/12/2023, 5:19 PM
Hi all, I’ve set up an Automation to re-run flows that crash. This nicely catches random crashes but goes haywire when an error inherent to the flow causes crashes — the flow will re-run and crash ad infinitum. Is there any way to limit the number of times an Automation is triggered? In this case I’d like Prefect to try re-running only once, then stand down.
c

Christopher Boyd

07/12/2023, 5:54 PM
I don’t think so, but this is good feedback - I think something like a
limit
could be good here
@Will Raphaelson ^^ ?
r

Robert Banick

07/12/2023, 5:58 PM
Digging around I can find an alternative fix, which is to put in place another Automation which stops the work queue after a set number of failures/crashes, as per here https://github.com/anna-geller/prefect-cloud-automations/blob/main/flow_runs/pause_work_queue_after_3_failures.py
But burning one automation to “fix” another is pretty unsatisfying given I’m limited to 3 automations at present
w

Will Raphaelson

07/12/2023, 5:59 PM
yeah this is a good feedback thank you. I see two potential approaches with the current features. • Use retries in the flow decorator. This will only retry up to x times and then it will fail terminally. could be annoying to add this to all relevant flows. • Set up a another automation as a watcher, as you wrote above.
yeah I hear you - its suboptimal right now. I’ll note this for future enhancements.
r

Robert Banick

07/12/2023, 6:00 PM
Thanks for the quick reply both, I appreciate your taking this into account for the future
@Will Raphaelson on your point 1 — since this is a crash that happens pre-flow I don’t think a retry would work? In this specific case we accidentally changed the requirements such that it loaded in a bad library via
pip
at runtime, causing everything to burn down before the flow even loaded.
w

Will Raphaelson

07/12/2023, 6:04 PM
oh thats a good callout - let me see if there is a technical reason we dont support retries on crashed states, if we did, we’d get get this counting “for free” and would be useful.
r

Robert Banick

07/12/2023, 6:07 PM
You do support retries on crashed states! You just don’t support stopping them ^_^
So our poor notifications gave us a heart attack last night when hundreds of crashes suddenly appeared b/c it kept retrying a compromised flow
w

Will Raphaelson

07/12/2023, 6:08 PM
wouldn’t @flow(retries=5) get you where you need to go?
r

Robert Banick

07/12/2023, 6:09 PM
ah OK I misunderstood — I’m not sure whether flow retries support crashed states or not. I thought you were referring to automations.
We can certainly try the
retries
parameter and see where that gets us.
w

Will Raphaelson

07/12/2023, 6:10 PM
yeah let me know how that goes. im almost positive that this will work well, and if it doesn’t we should build support for it and it shouldnt be too hard.
r

Robert Banick

07/12/2023, 6:43 PM
retries
as a flow parameter does not appear to trigger (or limit) retries on crashes
The automation will indeed trigger retries but the
retries
flow parameter will do nothing to limit the automation. They appear to work in different ways
w

Will Raphaelson

07/12/2023, 7:19 PM
thanks for trying that robert. if you want to open a github issue for supporting crashed states on flow retries in our github repo we can discuss and prioritize there!
r

Robert Banick

07/12/2023, 7:19 PM
Sounds good, will do
Up, let me know if I can add any additional helpful detail https://github.com/PrefectHQ/prefect/issues/10211
w

Will Raphaelson

07/12/2023, 9:36 PM
thanks!
4 Views