<@ULVA73B9P> we have flow runs stuck in the "Runni...
# ask-marvin
w
@Marvin we have flow runs stuck in the "Running" state. They've clearly failed as the kubernetes pod is no longer around. Is there no heartbeat that updates the flow run state?
m
It seems you're experiencing an issue where flow runs are stuck in the "Running" state due to the Kubernetes pod no longer being present. This is a known problem and has been discussed in the Prefect community. Here are some insights and possible solutions: 1. Kubernetes Pod Lifecycle: Prefect's handling of Kubernetes pod lifecycle events, such as evictions or restarts, can cause flow runs to remain in "Running" or "Pending" states. This can happen if the Prefect worker loses context about the running jobs. Prefect does not automatically handle these scenarios well, and you may need to implement custom automations or notifications to manage such events. GitHub Issue. 2. Heartbeat Mechanism: While Prefect does have mechanisms to monitor flow run states, they may not always update correctly if the underlying infrastructure (like Kubernetes) changes unexpectedly. Prefect 3.x does not explicitly mention a heartbeat mechanism in the documentation for handling such cases. 3. Version-Specific Issues: If you are using specific versions of Prefect and prefect-kubernetes, there may be bugs that affect how job states are updated. For example, a known issue exists with prefect-kubernetes 0.5.3 where jobs get stuck in the "Cancelling" state. GitHub Issue. To mitigate these issues, consider the following: - Monitor and manually update the state of stuck flow runs if necessary. - Implement automations to handle long-running jobs or unexpected pod terminations. - Stay updated with the latest Prefect and prefect-kubernetes releases as they may contain fixes for such issues. For further details, you might want to consult the Prefect documentation or the relevant GitHub issues.
b
Hi Will! I think upgrading to version 3.1.8 of Prefect will help you out with this. It introduces flow run heartbeats to prevent 'zombie' flow runs. There's some instructions in the link for how to set this up.