Hi folks, dumb questions I'm having a hard time finding answers to-
What happens in a Dask cluster if/when your dask-scheduler process dies? Tasks continue to run to completion on workers but nothing new gets scheduled? Can you just restart the scheduler and work picks up? Is workflow state lost if the scheduler dies; if not, where does the scheduler keep it? Happy to be pointed to documentation on these questions, I just don't see lower level operational details discussed in the Dask site docs.
j
Jeremiah
09/01/2019, 5:46 PM
Hi Jonah, the scheduler is a point of failure in Dask. As you wrote, tasks will complete but nothing new will happen. The scheduler maintains state, so a new scheduler can not solve the issue. Here are some relevant docs: https://distributed.dask.org/en/latest/resilience.html
j
Jonah Benton
09/01/2019, 5:52 PM
Ok, thanks Jeremiah. I had missed those two sentences:
Copy code
The process containing the scheduler might die. There is currently no persistence mechanism to record and recover the scheduler state.
The workers and clients will all reconnect to the scheduler after it comes back online but records of ongoing computations will be lost.
j
Jeremiah
09/01/2019, 5:53 PM
No worries, it’s hidden away 😉 we bacame very familiar with the ins and outs of Dask over the last year
Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.