Hi all!
i'm running multiple tasks inside a subflow, using default DaskRunner and I keep hitting a wall.
not all tasks are started at once, looks like there are 12 tasks starting and than the rest of them are stuck until a few seconds later.
I see when looking at Dask dashboard I have 4 workers with 3 n_threads
how can I edit these default numbers?
(all tasks are just waiting for http calls so I don't mind handling a lot of threads)
# Use 4 worker processes, each with 2 threads
DaskTaskRunner(
cluster_kwargs={"n_workers": 4, "threads_per_worker": 2}
)
Also, if you’re just making http calls, you don’t have to use Dask, you could use the default ConcurrentTaskRunner which executes submitted tasks using Python’s native concurrency. Should be lighter weight if you are only IO bound.
l
Lior Barak
07/18/2023, 7:44 AM
amazing thanks!
when I used ConcurrentTaskRunner it took quite a lot of time to start the new tasks,
so starting 20 tasks (http requests) from a single flow took a little long (maybe 4 seconds)
that's why i'm looking into Dask
e
Emil Christensen
07/18/2023, 2:52 PM
Hmm, that’s odd… it should be very quick - quicker than Dask. Were you using
.submit
?
l
Lior Barak
07/27/2023, 9:57 AM
ah I see I wasn't using
submit
properly
is there an upper limit for
ConcurrentTaskRunner
?
lets say I want to run 1000 flows with 100 tasks each in parallel (on Server not cloud)
can I just make Prefect run on a strong machine?
Lior Barak
07/27/2023, 9:59 AM
(90% of the tasks are async post calls to external REST servers)
Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.