Hello from Chicago! I'm new to this space, so if t...
# prefect-community
j
Hello from Chicago! I'm new to this space, so if this is the wrong channel please let me know and I'll move my question. I have a flow that could benefit from parallelism, and code that can provision / start a Dask cluster in Kubernetes. I know that I want to use a
DaskExecutor
to speed it up. It's clear from the documentation how to do this if you already have a Dask cluster up and running and just want to use it as an executor for a flow run. For my use case, I'd like to start the Dask cluster at the beginning of a flow run and stop it at the end of a flow run. Right now I'm running this flow in a standalone Python process (just a script with flow code that ends in
flow.run()
), not using an agent talking to Prefect Cloud. What is the recommended way to get the behavior I want, where the Dask cluster gets started when the flow run starts and stopped when it stops, without using Prefect Cloud? • Somehow use Dask Cloud Provider Environment or Dask Kubernetes Environment without Prefect Cloud • Extend DaskExecutor by overriding its setup and teardown to start / stop the cluster • something else that I'm missing • it's not possible, use Prefect Cloud Thanks very much!
👋 2
n
Hi @james.lamb and welcome! This isn't natively supported yet without Cloud, but there's a ticket that proposes managing the Dask cluster lifecycle in the executor (which would do pretty much what you're hoping here). In the meantime, the best I can recommend would be to bake this into your code before/after starting your run (understanding this isn't a very elegant solution):
Copy code
## Create your Dask cluster
flow.run()
## Tear down your Dask cluster
j
Thanks for the quick response! I failed to mention a really important detail...my flow is running on an
IntervalSchedule
. That means that the
## Tear Down your Dask cluster
code would never be reached, right?
n
Ah yeah, that's correct
In that case, there's probably a more convoluted world where the last task in your flow could kick off an external script that handles the teardown?
j
I had considered that, but I don't think you could have a task running on a DaskExecutor stop that executor's Dask cluster, right? Communication with Dask would be lost and the flow would consider the task failed, I think
unless you mean "kick it off and hope that it takes longer to complete than your task"
Other than "it might be obsolete whenever https://github.com/PrefectHQ/prefect/issues/2508 is addressed", is there any other reason that the writing my own extension of
DaskExecutor
with these hooks would be a bad idea?
n
Yeah definitely not an ideal solution there
I don't think that sounds like a bad idea and if you'd be willing to share it I'm sure others could benefit from an extension like that!
j
Ok thank you! I definitely would like to help out on that issue you linked. I just subscribed to it, and I'll share my use case from this post on there as another argument in favor of it. If I end up going down this road of extending
DaskExecutor
, I'll definitely share what I learned and how it might inform the implementation for that issue. Thanks very much for your help!
n
Great! Sorry I couldn't provide a better solution, definitely share your use case there, that'll be very helpful.
j
no prob! Honestly even just telling me "that sounds like a reasonable path even if it's non-ideal" is really helpful
😄 1