Hi! I’m working on a flow which requires large con...
# ask-community
j
Hi! I’m working on a flow which requires large concurrency (tens of thousands of tasks). Previously I’ve used the
LocalDaskExecutor
as my requirements for concurrency were low. However with this flow I want to use the real
DaskExecutor
. As I’m aware Prefect is capable of creating a temporal Dask cluster for the running flow (using
KubeCluster
), but alternatively can use an existing deployed Dask cluster (or even dask gateway if I’m not mistaken). Note, I’m running in Kubernetes, so adaptive scaling could be interesting. Are there any guidelines / suggestions / experiences for using a temporal vs. static Dask cluster? Additionally, if I have
Docker
as storage, how can I supply the correct image tag containing my modules/dependencies while registering the flow?
1
a
@Joël Luijmes Yes, there are! This page discusses the differences. And regarding KubeCluster, check out this blog post.
but if you have any specific questions or issues, let us know here
j
Oef, quick reply. Thanks! Missed this documentation page but discovered I can access the image using
prefect.context.image
🤓
👍 1
a
Regarding the image: You can provide it to your run configuration and specify the image from the context in your Cluster class, e.g.:
Copy code
FargateCluster(n_workers=n_workers, image=prefect.context.image)
j
Okay, one more thing. Should the image contain the flow already? Or is there some magic that Prefect serializes the flow and copies it to the dask workers before starting tasks?
a
Storage defines where Prefect can find the flow, so this image doesn’t have to contain the flow. It’s only for package dependencies. But if you use Docker storage, then this image needs to contain the flow, either a pickled flow (default when using Docker storage) or
stored_as_script
.
j
Makes sense, thanks 😉