I was curious what was the limit for the number of mapped tasks? (not sure if I have the terms correct) I want to run some adhoc flows that potentially kick off anywhere between 100-100,000 tasks (the same task but parameterized)
k
Kevin Kho
05/27/2021, 3:27 AM
Hey @nick vazquez, I think the answer for this is the limits of your hardware. We have seen workloads on the order of 10,000 task runs.
n
nick vazquez
05/27/2021, 3:28 AM
(Nice talk at the dask summit btw ๐) Thanks for the quick response! -- when you say
limits of hardware
, you are referring to amount of memory to hold the intermediate values between dask tasks?
k
Kevin Kho
05/27/2021, 3:35 AM
Thank you for watching! Yes and also if your cluster has enough workers to run the tasks.
n
nick vazquez
05/27/2021, 3:35 AM
Does it need to be able to run all of the mapped tasks, or will it just queue them?
k
Kevin Kho
05/27/2021, 3:36 AM
It will queue them in a depth first execution manner (although sometimes do it in breadth first execution, itโs hard to control). If you have 1000 tasks over 10 workers, it will run 10 at a time.
Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.