When you cancel a flow run from the Prefect UI and it remains in the "Cancelling" state indefinitely, it could be due to several reasons, often related to infrastructure or process handling issues:
1.
Infrastructure Support: The infrastructure on which the flow is running might not support cancellation properly. If the infrastructure doesn't handle cancellation requests, the flow run might stay stuck in the "Cancelling" state.
2.
Scope Mismatch: If the identifier scope does not match when attempting to cancel a flow run, the worker may not be able to cancel it, causing it to remain in the "Cancelling" state.
3.
Missing Infrastructure: If the infrastructure associated with the run cannot be found or has already been terminated, the flow run might not transition to "Cancelled."
4.
Race Conditions: There might be race conditions, especially when dealing with nested or dependent flows, causing sub-deployments to get stuck in the "Cancelling" state.
5.
Version-Specific Bugs: Certain versions of Prefect or its integrations (e.g.,
prefect-kubernetes
) may have bugs that affect the transition from "Cancelling" to "Cancelled." For example, issues have been reported with specific versions of
prefect-kubernetes
causing jobs to remain stuck in "Cancelling".
For more details, you can refer to the relevant GitHub issues and documentation:
-
Job stuck in Cancelling state
-
Race condition causing sub-deployments to be stuck
For a deeper understanding of the cancellation process and potential pitfalls, you might want to review the
Prefect documentation on managing states.