I have a flowrun with some curious logs. The logs claim that the heartbeat was not detected and the flow was cancelled, but according to the server it was still running and I cancelled it manually just now. The id is 99b7bd01-7b90-4634-9718-8e2eab1c00b0. Thank you.
The no heartbeat part was correct and I know what caused that. The problem was that prefect said it was cancelling the flow run, but the flow run never actually got cancelled. It was still in the running state many hours later.
03/29/2022, 4:27 PM
I see. this must be some infrastructure issue. If you need help troubleshooting, can you share the flow and on what agent does it run?
I see. It looks like you already know the answer 🙂 we generally avoid killing flow runs by ourselves since it could be that your flow run is behaving fine but only something got wrong in the connection to the orchestration API.
the entire handling of such infrastructure issues, flow run cancellation as well as resource termination is honestly super difficult and I think tbh that Prefect handles it properly by being more cautious rather than staring to kill your pods without being 100% sure that this is absolutely necessary