Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.

Prefect Community

Hi All - is there a best practice when it comes to parallelization within tasks (I'm using local Dask cluster for reference)? My initial flow used task mapping but there were tens of thousands of mapped items which quickly burn through the free tier limit. My current thought is to just test basic parallelization (e.g., Python's multiprocessing) within a task but I worry that it will interfere with Dask's use of resources  Thanks in advance for any suggestions

You can do that but just remove the DaskExecutor because Dask will not allow the two stage parallelism which is what will happen. You can just use LocalExecutor and then use Dask/multiprocessing inside the task for the most part

Thanks for the quick response. So any use of parallelization within tasks would ultimately mean giving up task parallelization?

You should not have both because two stage parallelization can cause resource contention