Hi, when a worker gets killed or crashes while it ...
# ask-community
n
Hi, when a worker gets killed or crashes while it is executing a flow run, the flow run gets stuck in
running
state. As this is confusing and inconsistent, I'd like to resolve it and get consistent states on our self-hosted Prefect server. I was hoping that a migration from agents to workers would fix it, but despite the worker's heartbeat functionality and workers being recognised as unhealthy/dead by the server, the flow run still remains in
running
. I can work around this by updating flows manually from external processes (e.g. k8s lifecycle hooks on pods), but I would like to avoid having to manage flow state from outside of Prefect itself. I'd be curious to learn how others are approaching this issue. Maybe I'm also still missing something on Prefect, my expectation was that it would mark flow runs from dead workers as
failed
.
Does anyone have thoughts on this on how they are addressing this issue of flows getting stuck in an inconsistent
running
state on worker restarts, crashes, etc.? It seems to be something that should be very common for people to encounter.