https://prefect.io logo
n

Newskooler

10/29/2020, 9:32 AM
Hi P When I run a fairly simple Flow (get data -> check if it exists -> save to a couple of places), if I map it such the recurrent Flow need to run over 15k+ times, why does it take over 60 min to start the flow task which are mapped ? I guess it’s an expected behaviour (I am running a single worker Dask executor), but I want to understand why this delay happens with the hope of optimizing it a bit. Thanks : )
d

Dylan

10/29/2020, 1:49 PM
Hi Stelian! Just to clarify, it takes 60 minutes to start all of the mapped tasks?
n

Newskooler

10/29/2020, 1:51 PM
Hi @Dylan, so I have mapped and non-mapped tasks. The non-mapped ones (which are before the mapped usually) start straight away. Then before the first mapped task starts, it takes a long time - 60 min in the case of 15k mapped tasks. More or less if more or less mappings exist.
d

Dylan

10/29/2020, 2:00 PM
What version of Prefect are you running on?
n

Newskooler

10/29/2020, 2:01 PM
prefect==0.13.9
d

Dylan

10/29/2020, 2:08 PM
I’d suggest either: • Sizing up your Dask worker resources • Increasing the number of Dask workers
n

Newskooler

10/29/2020, 2:09 PM
So increase the server essentially to a bigger one and then add more workers?
d

Dylan

10/29/2020, 2:10 PM
Correct haha
n

Newskooler

10/29/2020, 2:12 PM
okay, but why is there a bottleneck here? Is this expected? Can it be improved significantly on my end (user) or on your end?
n

nicholas

10/29/2020, 3:09 PM
Hi @Newskooler - can you give us some idea of your setup? • Are you running on kubernetes or something else? • How large are your tasks? What's the size of the output of
cloudpickle.dumps(your_mapped_task)
in bytes?
n

Newskooler

10/29/2020, 3:43 PM
Hi @nicholas: • I am running a Dask Executor on a simple server (no Kubernetes) • I don’t quite understand your question? How large i the data I move between the tasks?
n

nicholas

10/29/2020, 3:58 PM
Got it @Newskooler - it's likely that your Dask cluster is starved for resources with that many tasks. Tasks that are serializing lots of data and passing it between them or large tasks graphs, resource starved schedulers/workers/clients, or even poorly structured code can all be bottlenecks. In your case it sounds like there's a combination of all the above. For now I'd follow @Dylan’s advice and look at the resources you're giving your scheduler/worker
I'd also take a look at some of the Dask resources for configuring your Dask cluster, you can find really good docs for that here and here.
n

Newskooler

10/29/2020, 4:03 PM
Okay, thank you very much. I will follow through. The only additional Q I have is: when you say “bad code” what do you mean and refer to? Is it the flow structure that can potentially be optimized or the code inside the taks? If the latter, it does not work until 60 min after the start 🤔
n

nicholas

10/29/2020, 4:14 PM
Either of those could be issues (flow structure or intra-task code) and could be optimized for performance but like I mentioned I don't think this is your issue at the moment since your tasks aren't starting in the first place.
👍 1
n

Newskooler

10/29/2020, 4:30 PM
Okay, thank you @nicholas
Hi @nicholas I did try with a huge server and the have the same results, so I guess it’s some settings somewhere.
Can you please give me some guide and where / what to look for to address this?
n

nicholas

11/04/2020, 2:51 PM
Hi @Newskooler, did you take a look at the links I posted above for configuring your Dask Cluster? This one and this one.
n

Newskooler

11/05/2020, 9:49 AM
I have not. I will check hem first then come back to this if necessary. Thanks!
👍 1
3 Views