Hello from Chicago! I'm new to this space, so if t...
Hello from Chicago! I'm new to this space, so if this is the wrong channel please let me know and I'll move my question. I have a flow that could benefit from parallelism, and code that can provision / start a Dask cluster in Kubernetes. I know that I want to use a
to speed it up. It's clear from the documentation how to do this if you already have a Dask cluster up and running and just want to use it as an executor for a flow run. For my use case, I'd like to start the Dask cluster at the beginning of a flow run and stop it at the end of a flow run. Right now I'm running this flow in a standalone Python process (just a script with flow code that ends in
), not using an agent talking to Prefect Cloud. What is the recommended way to get the behavior I want, where the Dask cluster gets started when the flow run starts and stopped when it stops, without using Prefect Cloud? • Somehow use Dask Cloud Provider Environment or Dask Kubernetes Environment without Prefect Cloud • Extend DaskExecutor by overriding its setup and teardown to start / stop the cluster • something else that I'm missing • it's not possible, use Prefect Cloud Thanks very much!
Hi @james.lamb and welcome! This isn't natively supported yet without Cloud, but there's a ticket that proposes managing the Dask cluster lifecycle in the executor (which would do pretty much what you're hoping here). In the meantime, the best I can recommend would be to bake this into your code before/after starting your run (understanding this isn't a very elegant solution):
Thanks for the quick response! I failed to mention a really important detail...my flow is running on an
. That means that the
code would never be reached, right?
Ah yeah, that's correct
In that case, there's probably a more convoluted world where the last task in your flow could kick off an external script that handles the teardown?
I had considered that, but I don't think you could have a task running on a DaskExecutor stop that executor's Dask cluster, right? Communication with Dask would be lost and the flow would consider the task failed, I think
unless you mean "kick it off and hope that it takes longer to complete than your task"
Other than "it might be obsolete whenever https://github.com/PrefectHQ/prefect/issues/2508 is addressed", is there any other reason that the writing my own extension of
with these hooks would be a bad idea?
Yeah definitely not an ideal solution there
I don't think that sounds like a bad idea and if you'd be willing to share it I'm sure others could benefit from an extension like that!
Ok thank you! I definitely would like to help out on that issue you linked. I just subscribed to it, and I'll share my use case from this post on there as another argument in favor of it. If I end up going down this road of extending
, I'll definitely share what I learned and how it might inform the implementation for that issue. Thanks very much for your help!
Great! Sorry I couldn't provide a better solution, definitely share your use case there, that'll be very helpful.
no prob! Honestly even just telling me "that sounds like a reasonable path even if it's non-ideal" is really helpful
