https://prefect.io logo
z

Zachary Loertscher

08/23/2023, 2:04 PM
Hey all - has anybody had success in detecting hanging flows? I have the following automation set up on prefect cloud (Prefect v2.8.7), but my flow ran for 6.5 hours this morning and nothing was sent to our slack channel to notify us (2nd time this has happened). Is there a work-around others use to detect hanging flows? Or am I missing something on the automation side? Flow run id: 6c1ee67b-db06-45ea-8a6a-9585f7cf1d52 Automation id: 5d633fec-fb09-409f-b1fa-5270dcc429f2
āœ… 1
c

Chris Guidry

08/23/2023, 2:12 PM
Hi Zach, taking a look now...
thank you 1
Ahhh, yes sorry, this is a common source of mixups when using the "Stays In" (aka
Proactive
) automations. So your automation here says "If a flow run emits any one of the
Resuming/Running/Pending/Late
events, and then doesn't emit another event for the next 4 hours, trigger the actions..." The problem here comes from including both
Pending
and
Running
in the same automation: most flow runs generally go from
Pending
to
Running
quite quickly, so the
Running
event "clears out" the automation after it started from the
Pending
event. Here's the sequence: 070005.518905 UTC ->
Pending
ā€¢ The automation starts watching for subsequent events 070005.705739 UTC ->
Running
ā€¢ The automation says "cool, we got an event, so it's not stuck in `Pending`", then closes itself out You may want to create a second automation that just catching things stuck in
Pending
, separately from the
Running/Resuming/Late
ones.
z

Zachary Loertscher

08/23/2023, 2:27 PM
Ah, I see - so once it has transitioned from
Pending
->
some_other_state
, the automation stops watching for hangs? I'll remove "Pending" from that list in that case, then. Unfortunately we are at our 3 automation limit so I cannot add another automation Sounds like there currently isn't a way for the automation to continue monitoring, even when a status changes? Or maybe watching multiple statuses in a single automation isn't yet supported?
c

Chris Guidry

08/23/2023, 2:31 PM
Sorry, yes that's accurate, an automation will only start based on any one of those events, and then clear when it reaches one of the expected events (in this case
prefect.flow-run.*
(any flow run event). Another way you can approach this is to switch to a custom trigger, not using the UI, and use something like this:
Copy code
after: ['prefect.flow-run.Pending', 'prefect.flow-run.Running', 'prefect.flow-run.Resuming', 'prefect.flow-run.Late']

expect: ['prefect.flow-run.Completed', 'prefect.flow-run.Crashed', 'prefect.flow-run.Failed', ...]
Where instead of looking for any event, you set it to look for "final" events. So it could start with any of those pending/running/etc, and will only clear if it gets to completed/crashed/failed/etc within 4 hours. I'll follow up in DM about the 3 automation limit....
thank you 1
z

Zachary Loertscher

08/23/2023, 2:34 PM
Awesome, this is great - I'll switch this over to a custom trigger in that case šŸ™Œ