https://prefect.io logo
Title
m

Michael Holzapfel

11/21/2022, 7:46 AM
Hi Everyone, I'm trying out prefect2 with a self hosted orion-server and I noticed that an agent crash left over "running" flow-runs which in turn interfere or block execution for the respective workqueue if a concurrency limit is set. Is there a canonical way to deal with such a situation?
m

Michiel Verburg

11/21/2022, 9:53 AM
This is interesting, I would like to know an answer to that as well. I looked into concurrency limits before and essentially could not use them because of what you mention
c

Christopher Boyd

11/22/2022, 5:41 PM
Where is your agent running? If a flow is mid-run, the flows running in their respective infrastructure are responsible for updating and setting state on the flow. Only in the condition where the agent and flows are both running locally should this be the case that I know of.
m

Michael Holzapfel

11/23/2022, 8:12 AM
The symptom is independent from the agents location or infrastructure used. My question is how to deal with "Zombies", e.g. flow-runs that have been started but due to an incident (agent crashing, careless trainee, hardware failure, power outage, ...) never changed their status. Especially when using concurrency limits these zombies keep blocking the workqueue and can have a wide impact on the entire system. Are there any methods or recommendations how to detect these "Zombies" that I'm missing?