hey all again! Do you have recommendations on how ...
# prefect-community
c
hey all again! Do you have recommendations on how to handle an Agent killed due to OOM? We currently see Flows stay in a
Running
state, then we do a cancel via the API, which leaves them in Cancelling state
k
Hey @Christian Nuss, why type of agent is that?
c
its a bit jerry rigged at the moment, currently its a
local
agent
jerry rigged == running a prefect agent in local mode on Kubernetes, so everything is happening in a single container 🤦‍♂️ (working on fixing that), so its no surprising we're OOM'ing, but i'd like to get flows in a good final state before proceeding
k
I think this will fix anyway when you move to the k8s agent. The Flow has a heartbeat that consistently pings Prefect to make sure it is alive. When you run into OOM, the flow and heartbeat will die, and then Prefect will mark the Flow as failed after enough missed heartbeats, so really Prefect should be able to mark it as Failed eventually and you should not need to edit the state. Does that make sense?
c
total sense, thank you! how many heartbeats does it take? something i cancelled 3 hrs ago is still
Cancelling
k
Well this cancelling I am unsure it will resolve so I would just mark it as Failed. The heartbeats missing should mark it as failed in like 20 mins I think
c
curious: which piece of the stack is responsible for this? graphql, apollo or towel?
k
Towel
See this
c
thank you! any tips on how to
mark it as failed
?
k
There is a set state in the UI where you can set the state of a Task or Flow
c
ooo nice thanks!