https://prefect.io logo
Title
c

Christian Nuss

03/02/2022, 8:33 PM
hey all again! Do you have recommendations on how to handle an Agent killed due to OOM? We currently see Flows stay in a
Running
state, then we do a cancel via the API, which leaves them in Cancelling state
k

Kevin Kho

03/02/2022, 8:35 PM
Hey @Christian Nuss, why type of agent is that?
c

Christian Nuss

03/02/2022, 8:37 PM
its a bit jerry rigged at the moment, currently its a
local
agent
jerry rigged == running a prefect agent in local mode on Kubernetes, so everything is happening in a single container 🤦‍♂️ (working on fixing that), so its no surprising we're OOM'ing, but i'd like to get flows in a good final state before proceeding
k

Kevin Kho

03/02/2022, 8:47 PM
I think this will fix anyway when you move to the k8s agent. The Flow has a heartbeat that consistently pings Prefect to make sure it is alive. When you run into OOM, the flow and heartbeat will die, and then Prefect will mark the Flow as failed after enough missed heartbeats, so really Prefect should be able to mark it as Failed eventually and you should not need to edit the state. Does that make sense?
c

Christian Nuss

03/02/2022, 8:49 PM
total sense, thank you! how many heartbeats does it take? something i cancelled 3 hrs ago is still
Cancelling
k

Kevin Kho

03/02/2022, 8:49 PM
Well this cancelling I am unsure it will resolve so I would just mark it as Failed. The heartbeats missing should mark it as failed in like 20 mins I think
c

Christian Nuss

03/02/2022, 8:52 PM
curious: which piece of the stack is responsible for this? graphql, apollo or towel?
k

Kevin Kho

03/02/2022, 8:53 PM
Towel
See this
c

Christian Nuss

03/02/2022, 8:57 PM
thank you! any tips on how to
mark it as failed
?
k

Kevin Kho

03/02/2022, 8:58 PM
There is a set state in the UI where you can set the state of a Task or Flow
c

Christian Nuss

03/02/2022, 8:59 PM
ooo nice thanks!