https://prefect.io logo
Title
j

Jonah Benton

09/01/2019, 4:19 PM
Hi folks, dumb questions I'm having a hard time finding answers to- What happens in a Dask cluster if/when your dask-scheduler process dies? Tasks continue to run to completion on workers but nothing new gets scheduled? Can you just restart the scheduler and work picks up? Is workflow state lost if the scheduler dies; if not, where does the scheduler keep it? Happy to be pointed to documentation on these questions, I just don't see lower level operational details discussed in the Dask site docs.
j

Jeremiah

09/01/2019, 5:46 PM
Hi Jonah, the scheduler is a point of failure in Dask. As you wrote, tasks will complete but nothing new will happen. The scheduler maintains state, so a new scheduler can not solve the issue. Here are some relevant docs: https://distributed.dask.org/en/latest/resilience.html
j

Jonah Benton

09/01/2019, 5:52 PM
Ok, thanks Jeremiah. I had missed those two sentences:
The process containing the scheduler might die. There is currently no persistence mechanism to record and recover the scheduler state.

The workers and clients will all reconnect to the scheduler after it comes back online but records of ongoing computations will be lost.
j

Jeremiah

09/01/2019, 5:53 PM
No worries, it’s hidden away 😉 we bacame very familiar with the ins and outs of Dask over the last year