Hi! Are Dask Executors the only environment that can achieve parallelization for mapped tasks? For example, I have a custom task called
run_command
, and inside it is launching the command on a
RunNamespacedJob
to use Kubernetes. I have multiple commands that take a while to complete so I would like multiple Namespaced Jobs to run at the same time, I tried using a mapping like:
Copy code
with Flow as flow:
run_command.map([cmd1, cmd2,...])
But Prefect is running each Namespaced Job in serial. Would switching to a Dask executor be the key? Or could I adjust the map function to achieve parallelization?
k
Kevin Kho
04/14/2021, 8:41 PM
Hi @Justin Chavez! Yes you need a
DaskExecutor
or
DaskLocalExecutor
to achieve parallelism.
j
Justin Chavez
04/14/2021, 10:41 PM
Got it thanks!
r
Ranu Goldan
04/15/2021, 8:50 AM
Hi @Kevin Kho additional question of this. Does prefect cloud include this feature? Or we need to deploy our own Dask cluster to achieve this ability?
k
Kevin Kho
04/15/2021, 1:31 PM
Prefcet does not manage hardware. You have to provide your own. I suggest you look at Coiled: https://docs.coiled.io/user_guide/example-prefect.html . They are free right now and you can spin up a Dask cluster pretty easily.
Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.