Hi! I’m working on a flow which requires large concurrency (tens of thousands of tasks). Previously ...

Joël Luijmes

12/08/2021, 9:28 AM

Hi! I’m working on a flow which requires large concurrency (tens of thousands of tasks). Previously I’ve used the

LocalDaskExecutor

as my requirements for concurrency were low. However with this flow I want to use the real

DaskExecutor

. As I’m aware Prefect is capable of creating a temporal Dask cluster for the running flow (using

KubeCluster

), but alternatively can use an existing deployed Dask cluster (or even dask gateway if I’m not mistaken). Note, I’m running in Kubernetes, so adaptive scaling could be interesting. Are there any guidelines / suggestions / experiences for using a temporal vs. static Dask cluster? Additionally, if I have

Docker

as storage, how can I supply the correct image tag containing my modules/dependencies while registering the flow?

✅ 1

Anna Geller

12/08/2021, 9:35 AM

@Joël Luijmes Yes, there are! This page discusses the differences. And regarding KubeCluster, check out this blog post.

Anna Geller

12/08/2021, 9:36 AM

but if you have any specific questions or issues, let us know here

Joël Luijmes

12/08/2021, 9:38 AM

Oef, quick reply. Thanks! Missed this documentation page but discovered I can access the image using

prefect.context.image

🤓

👍 1

Anna Geller

12/08/2021, 9:38 AM

Regarding the image: You can provide it to your run configuration and specify the image from the context in your Cluster class, e.g.:

Copy code

FargateCluster(n_workers=n_workers, image=prefect.context.image)

Joël Luijmes

12/08/2021, 9:53 AM

Okay, one more thing. Should the image contain the flow already? Or is there some magic that Prefect serializes the flow and copies it to the dask workers before starting tasks?

Anna Geller

12/08/2021, 9:59 AM

Storage defines where Prefect can find the flow, so this image doesn’t have to contain the flow. It’s only for package dependencies. But if you use Docker storage, then this image needs to contain the flow, either a pickled flow (default when using Docker storage) or

stored_as_script

Joël Luijmes

12/08/2021, 10:04 AM

Makes sense, thanks 😉

Open in Slack

Previous Next

Prefect Community

Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.