https://prefect.io logo
Title
s

Seth Goodman

07/19/2022, 9:32 PM
Hi All - is there a best practice when it comes to parallelization within tasks (I'm using local Dask cluster for reference)? My initial flow used task mapping but there were tens of thousands of mapped items which quickly burn through the free tier limit. My current thought is to just test basic parallelization (e.g., Python's multiprocessing) within a task but I worry that it will interfere with Dask's use of resources Thanks in advance for any suggestions
k

Kevin Kho

07/19/2022, 9:33 PM
You can do that but just remove the DaskExecutor because Dask will not allow the two stage parallelism which is what will happen. You can just use LocalExecutor and then use Dask/multiprocessing inside the task for the most part
s

Seth Goodman

07/19/2022, 9:38 PM
Thanks for the quick response. So any use of parallelization within tasks would ultimately mean giving up task parallelization?
k

Kevin Kho

07/19/2022, 9:47 PM
You should not have both because two stage parallelization can cause resource contention
🙌 1
s

Seth Goodman

07/19/2022, 9:47 PM
Got it, thanks!