Hey! Question about flow cancellation & Dask workers on k8s (Prefect 2.7.7)
I’ve got a Kubernetes deployed flow, which uses dask_kubernetes and spawns dask pods (scheduler + workers) to execute the tasks. When I cancel a running flow from the cloud UI, it terminates the flow run and deletes the prefect job pod, but it leaves the dask pods there just hanging - still alive, but not actually doing anything. Is that intended, and if so, any suggestions on best approach for auto-cleanup of these? At the end of a failed or successful flow run these are auto-terminated, but not when the flow is cancelled.
01/11/2023, 2:47 PM
Hi David, you would likely need a customer handler. The Dask scheduler should be responsible for cleaning up the dask workers, but if that’s not the case, you would need a handler to do it manually -
I can check with the team if that’s on the integration to be included natively
01/11/2023, 2:50 PM
Hmm, but the Dask scheduler is also left hanging there as well - as in, when the flow run is cancelled, I think the prefect-job pod just gets deleted without having a chance to send a signal to the dask scheduler telling it to terminate. Whereas when a flow runs to completion, the prefect-job sends the dask scheduler (and its workers) the termination signal..?
01/11/2023, 3:01 PM
Let me check, when you get the cancellation, that sends it to the agent directly, which submits the cancellation to the infrastructure. There should be a ~30 second grace period before it’s forcefully killed, but I’ll check with the team
Like you say, it feels like the agent should allow the flow to take a cancellation action (e.g send a termination signal to the dask scheduler) but at present the flow pod just gets immediately deleted