The information I found doesn't seem directly relevant to automatic recovery from a stalled Kubernetes Flow Run due to pod removal. However, Prefect does have capabilities around cancellation and monitoring infrastructure resources which might provide some hints.
For instance, if a cancellation is requested, Prefect moves the flow run to a "Cancelling" state and eventually forcefully terminates the run if it doesn't stop after a grace period. Additionally, Prefect's worker events can monitor pod events and forward them to Prefect for display in the UI, including pod eviction events. This telemetry might help in determining if a flow run has stalled due to underlying infrastructure issues.
For a potential automatic recovery feature, you might want to check Prefect's documentation on retries and resilience patterns or consider setting up monitoring and alerts to handle such incidents manually.
You can read more about cancellation and worker events here:
-
Flow Cancellation
-
Worker Events Blog Post
If you require further assistance, searching the GitHub issues for similar cases or reaching out to the community may provide more specific guidance on automatic recovery from pod eviction.