Hi Folks what is the current approach to force interrupt a f Prefect Community #ask-community

Hi Folks, what is the current approach to “force i...

Marwan Sarieddine

03/25/2021, 4:17 PM

Hi Folks, what is the current approach to “force interrupt” a flow and its tasks when cancelling? The reason I ask is currently when one sets the state of the flow as “Cancelled” - prefect will try to cancel the running tasks gracefully (i.e. wait for the task to complete before cancelling).

Marwan Sarieddine

03/25/2021, 4:17 PM

We often run into the case where we have a long-running task that we want to interrupt when we fail/cancel a flow. We had thought that perhaps setting the state of the task as failed would cause an interruption (in our case this means the worker pod on EKS should get terminated and then job should get terminated) but it does not - why is that the case ? and is there a way to kill the task without having to get access to the EKS cluster to kill the underlying pod ? More details on our setup: we are using prefect version 0.14.12, with prefect Cloud, a kubernetes agent and a dask-kubernetes execution environment on EKS.

Jim Crist-Harif

03/25/2021, 4:19 PM

If you're using a temporary dask cluster, cancelling the flow will shutdown the cluster, forcefully stopping all running/pending tasks. It's only when using an external dask cluster (which we can't kill forcefully) that we wait for all running tasks to finish.

Marwan Sarieddine

03/25/2021, 4:20 PM

Hi @Jim Crist-Harif - thank you for the prompt response We are using a temporary dask cluster, but it doesn’t look like prefect is stopping our running tasks - more specifically we have a task that is performing some database IO operations that we are having to resort to manually killing

Jim Crist-Harif

03/25/2021, 4:21 PM

Hmmm. Do you see the other workers in the cluster shutting down? I wonder if the kill signal is being ignored in the active task, leading for the cluster to not forcefully shutdown.

Jim Crist-Harif

03/25/2021, 4:22 PM

We just call

cluster.close()

, so this also might be a dask-kubernetes issue.

Marwan Sarieddine

03/25/2021, 4:22 PM

Good question - in this specific flow there is just that one task that is running, but I can try to replicate with a flow that is running different types of tasks (some with IO operations and some without)

Marwan Sarieddine

03/25/2021, 4:23 PM

will report back when I experiment with the type of tasks and perhaps share a somewhat reproducible example

Marwan Sarieddine

03/25/2021, 7:47 PM

@Jim Crist-Harif -sorry for the late reply but please find this sample flow - It just contains a single task that sleeps for a long-time

Copy code

@task()
def long_sleep():
    time.sleep(10 * 60)

with Flow(self.name) as flow:
    long_sleep()

Setting the task to failed doesn’t interrupt the task or kill the worker pod - but instead waits until the sleep is complete Cancelling the flow on the other hand is more explicit about this behavior in the logs:

Copy code

Stopping executor, waiting for 1 active tasks to complete

Is there a way from to interrupt this long running task from the cloud and kill the worker?

Jim Crist-Harif

03/25/2021, 7:53 PM

What executor are you running with?

Marwan Sarieddine

03/25/2021, 7:53 PM

DaskKubernetesEnvironment

Jim Crist-Harif

03/25/2021, 7:55 PM

Ah, cancellation won't work with that.

Jim Crist-Harif

03/25/2021, 7:55 PM

Environments are deprecated (in 0.14.0, we're now on 0.14.13).

Marwan Sarieddine

03/25/2021, 7:56 PM

I see - we are already in the process of migrating - so that’s good to know

Jim Crist-Harif

03/25/2021, 7:56 PM

To the executor,

DaskKubernetesEnvironment

looks like an external dask cluster (not a temporary dask cluster), so you get the same behavior (cancellation waits for the cluster to shutdown).

Marwan Sarieddine

03/25/2021, 7:56 PM

I see

Marwan Sarieddine

03/25/2021, 7:57 PM

thanks for shedding some light on this

Jim Crist-Harif

03/25/2021, 7:58 PM

No problem, please let us know if you run into other issues.

👍 1

229 Views

Open in Slack

Previous Next