I have a flowrun with some curious logs. The logs ...
# ask-community
d
I have a flowrun with some curious logs. The logs claim that the heartbeat was not detected and the flow was cancelled, but according to the server it was still running and I cancelled it manually just now. The id is 99b7bd01-7b90-4634-9718-8e2eab1c00b0. Thank you.
a
Thanks for sharing your flow run ID! Can you try the solution from this thread? https://discourse.prefect.io/t/flow-is-failing-with-an-error-message-no-heartbeat-detected-from-the-remote-task/79
d
The no heartbeat part was correct and I know what caused that. The problem was that prefect said it was cancelling the flow run, but the flow run never actually got cancelled. It was still in the running state many hours later.
a
I see. this must be some infrastructure issue. If you need help troubleshooting, can you share the flow and on what agent does it run?
d
The flowrun is https://cloud.prefect.io/anaconda/flow-run/99b7bd01-7b90-4634-9718-8e2eab1c00b0 The agent it was running on definitely has some problems and it is being replaced soon, I wouldn’t be surprised if it was misbehaving. But I thought the server should have killed it, there was even a message from the zombie killer in there.
a
I see. It looks like you already know the answer 🙂 we generally avoid killing flow runs by ourselves since it could be that your flow run is behaving fine but only something got wrong in the connection to the orchestration API.
the entire handling of such infrastructure issues, flow run cancellation as well as resource termination is honestly super difficult and I think tbh that Prefect handles it properly by being more cautious rather than staring to kill your pods without being 100% sure that this is absolutely necessary