Hello Prefect Community,
I started working on a distributed task queue library a few months back. The library is available as a python package to install a start using :
daskqueue - pypi package
For all its greatness, Dask implements a central scheduler (basically a simple tornado event loop) involved in every decision, which can sometimes create a central bottleneck.
This is a pretty serious limitation when trying to use Dask in high-throughput situations.
Daskqueue is a small python library built on top of Dask and Dask Distributed that implements a very lightweight
Distributed Task Queue. Daskqueue also implements persistent queues for holding tasks on disk and surviving Dask cluster restart. This could be used as a special Executor to enqueue work. I have experiment the limits of the
.map()
method where mapping on a large Iterable will bring the Dask cluster to a halt.
I also wrote an article about implementation details:
https://medium.com/@aminedirhoussi1/daskqueue-dask-based-distributed-task-queue-6fb95517dfea
Hope you enjoy it, can't wait to hear about your feedback 😉 !