Charles Leung
05/15/2024, 12:11 AMMarvin
05/15/2024, 12:11 AMDaskTaskRunner
configuration.
- Example:
python
DaskTaskRunner(cluster_kwargs={"n_workers": 4, "threads_per_worker": 2})
- Dask Collections: For distributing Dask collections across workers to achieve parallel computations, you can use get_dask_client
or get_async_dask_client
as context managers.
### Links to Documentation
- Work Pools and Workers Overview
- Scaling and Managing Multiple Task Queues
- DaskTaskRunner Usage Guide
To distribute your tasks across multiple workers specifically, you would need to:
1. Set up a work pool with the appropriate type that matches your execution environment and aims for scalability.
2. Configure multiple workers that poll from the work pool or are triggered by it (depending on the work pool's type).
3. Define your flow runs with a scalable task runner like DaskTaskRunner
, customizing the cluster_kwargs
to create multiple workers as needed.
By configuring this setup, your deployed flow should distribute the 100 concurrent tasks across the available workers, effectively achieving distributed execution. If you encounter issues, it is suggested to consider monitoring, centralized logging, and observability features to diagnose and manage the tasks' distribution effectively.