Hi Folks, what is the current approach to “force i...
# ask-community
m
Hi Folks, what is the current approach to “force interrupt” a flow and its tasks when cancelling? The reason I ask is currently when one sets the state of the flow as “Cancelled” - prefect will try to cancel the running tasks gracefully (i.e. wait for the task to complete before cancelling).
We often run into the case where we have a long-running task that we want to interrupt when we fail/cancel a flow. We had thought that perhaps setting the state of the task as failed would cause an interruption (in our case this means the worker pod on EKS should get terminated and then job should get terminated) but it does not - why is that the case ? and is there a way to kill the task without having to get access to the EKS cluster to kill the underlying pod ? More details on our setup: we are using prefect version 0.14.12, with prefect Cloud, a kubernetes agent and a dask-kubernetes execution environment on EKS.
j
If you're using a temporary dask cluster, cancelling the flow will shutdown the cluster, forcefully stopping all running/pending tasks. It's only when using an external dask cluster (which we can't kill forcefully) that we wait for all running tasks to finish.
m
Hi @Jim Crist-Harif - thank you for the prompt response We are using a temporary dask cluster, but it doesn’t look like prefect is stopping our running tasks - more specifically we have a task that is performing some database IO operations that we are having to resort to manually killing
j
Hmmm. Do you see the other workers in the cluster shutting down? I wonder if the kill signal is being ignored in the active task, leading for the cluster to not forcefully shutdown.
We just call
cluster.close()
, so this also might be a dask-kubernetes issue.
m
Good question - in this specific flow there is just that one task that is running, but I can try to replicate with a flow that is running different types of tasks (some with IO operations and some without)
will report back when I experiment with the type of tasks and perhaps share a somewhat reproducible example
@Jim Crist-Harif -sorry for the late reply but please find this sample flow - It just contains a single task that sleeps for a long-time
Copy code
@task()
def long_sleep():
    time.sleep(10 * 60)

with Flow(self.name) as flow:
    long_sleep()
Setting the task to failed doesn’t interrupt the task or kill the worker pod - but instead waits until the sleep is complete Cancelling the flow on the other hand is more explicit about this behavior in the logs:
Copy code
Stopping executor, waiting for 1 active tasks to complete
Is there a way from to interrupt this long running task from the cloud and kill the worker?
j
What executor are you running with?
m
DaskKubernetesEnvironment
j
Ah, cancellation won't work with that.
Environments are deprecated (in 0.14.0, we're now on 0.14.13).
m
I see - we are already in the process of migrating - so that’s good to know
j
To the executor,
DaskKubernetesEnvironment
looks like an external dask cluster (not a temporary dask cluster), so you get the same behavior (cancellation waits for the cluster to shutdown).
m
I see
thanks for shedding some light on this
j
No problem, please let us know if you run into other issues.
👍 1
220 Views