Hey all! :wave: I am using the prefect_dask module...
# ask-community
z
Hey all! 👋 I am using the prefect_dask module on prefect 2.8.7, and am unable to limit the # of tasks it runs in parallel with the
"n_workers"
kwarg. I've tried
"n_workers":8
and
"n_workers":1
, but they both kick off all 14 of my tasks at the same time What I want to do is the following (worked in prefect 1), but "scheduler ":"threads" is not accepted as a kwarg.
Copy code
task_runner=DaskTaskRunner( #allows for parallel execution
		cluster_kwargs={
            "n_workers": 8,
            "scheduler": 'threads'
        }
For context, I need it to only run 8 tasks at a time, not all 14. If it runs all of them, it crashes the CPU on my EC2. I'm pouring over the docs but I'm not seeing anything on this
1
r
Would a task run concurrency limit do what you need? https://docs.prefect.io/latest/ui/task-concurrency/
z
It might work, thanks- I will give that a try! If that's the best way to limit concurrency, what function does the
"n_workers"
kwarg serve? Do all workers have unlimited task-concurrency? Is each worker designated to run on a single thread?
r
It looks like
DaskTaskRunner
uses
distributed.LocalCluster
by default, which IIRC will start a separate process (each with multiple threads) for each worker unless you specify
"processes":  False
as of of the args to the cluster...in which case I believe it uses a thread per worker, though you can also specify
threads_per_worker
just to be sure. Docs for
dask.LocalCluster
are here: https://docs.dask.org/en/stable/deploying-python.html#reference
thank you 1
The nice thing about using a Prefect task concurrency limit is that it is agnostic to which task runner you use. Instead of using the Dask task runner with threads instead of processes, Prefect's default concurrent task runner with a concurrency limit set might do everything you need.
Otherwise, I'd try something like
Copy code
task_runner=DaskTaskRunner( 
		cluster_kwargs={
                    "processes": False
                    "n_workers": 8
                }
z
Awesome - this is great info! I will experiment with both options - but it sounds like both work so I might just use the default task runner to make things simpler 👍
👍 1
c
Hey @Zachary Loertscher , I was running into a similar issue- did the Prefect Concurrency limit work?
z
Hi @Choenden Kyirong! So the concurrency limit DOES work, but it works DIFFERENTLY. Based on my testing, it looks like the concurrent task runner will have groups of tasks running, but will wait for the first group finishes before moving onto the next. For example, the first screenshot below shows when I used the default, concurrent task runner (concurrency limit=2). It looks like tasks wait for their fellow tasks to finish before picking up another set of 2 tasks. On the 2nd screenshot below, I ran the prefect-dask task runner. Tasks don't wait for their fellow-tasks to finish before moving on to others @Ryan Peden feel free to correct me if I'm wrong but this is the behavior I'm observing.
r
Apologies for the slow reply. That's not quite what I expected from the concurrent task runner, but it definitely only runs things concurrently, not in parallel - which can work well if your tasks yield execution but awaiting or performing I/O, etc. How have things worked out with this? Is Dask doing what you need?