Thread
#prefect-community
    james.lamb

    james.lamb

    2 years ago
    Hello from Chicago! I'm new to this space, so if this is the wrong channel please let me know and I'll move my question. I have a flow that could benefit from parallelism, and code that can provision / start a Dask cluster in Kubernetes. I know that I want to use a
    DaskExecutor
    to speed it up. It's clear from the documentation how to do this if you already have a Dask cluster up and running and just want to use it as an executor for a flow run. For my use case, I'd like to start the Dask cluster at the beginning of a flow run and stop it at the end of a flow run. Right now I'm running this flow in a standalone Python process (just a script with flow code that ends in
    flow.run()
    ), not using an agent talking to Prefect Cloud. What is the recommended way to get the behavior I want, where the Dask cluster gets started when the flow run starts and stopped when it stops, without using Prefect Cloud? • Somehow use Dask Cloud Provider Environment or Dask Kubernetes Environment without Prefect Cloud • Extend DaskExecutor by overriding its setup and teardown to start / stop the cluster • something else that I'm missing • it's not possible, use Prefect Cloud Thanks very much!
    nicholas

    nicholas

    2 years ago
    Hi @james.lamb and welcome! This isn't natively supported yet without Cloud, but there's a ticket that proposes managing the Dask cluster lifecycle in the executor (which would do pretty much what you're hoping here). In the meantime, the best I can recommend would be to bake this into your code before/after starting your run (understanding this isn't a very elegant solution):
    ## Create your Dask cluster
    flow.run()
    ## Tear down your Dask cluster
    james.lamb

    james.lamb

    2 years ago
    Thanks for the quick response! I failed to mention a really important detail...my flow is running on an
    IntervalSchedule
    . That means that the
    ## Tear Down your Dask cluster
    code would never be reached, right?
    nicholas

    nicholas

    2 years ago
    Ah yeah, that's correct
    In that case, there's probably a more convoluted world where the last task in your flow could kick off an external script that handles the teardown?
    james.lamb

    james.lamb

    2 years ago
    I had considered that, but I don't think you could have a task running on a DaskExecutor stop that executor's Dask cluster, right? Communication with Dask would be lost and the flow would consider the task failed, I think
    unless you mean "kick it off and hope that it takes longer to complete than your task"
    Other than "it might be obsolete whenever https://github.com/PrefectHQ/prefect/issues/2508 is addressed", is there any other reason that the writing my own extension of
    DaskExecutor
    with these hooks would be a bad idea?
    nicholas

    nicholas

    2 years ago
    Yeah definitely not an ideal solution there
    I don't think that sounds like a bad idea and if you'd be willing to share it I'm sure others could benefit from an extension like that!
    james.lamb

    james.lamb

    2 years ago
    Ok thank you! I definitely would like to help out on that issue you linked. I just subscribed to it, and I'll share my use case from this post on there as another argument in favor of it. If I end up going down this road of extending
    DaskExecutor
    , I'll definitely share what I learned and how it might inform the implementation for that issue. Thanks very much for your help!
    nicholas

    nicholas

    2 years ago
    Great! Sorry I couldn't provide a better solution, definitely share your use case there, that'll be very helpful.
    james.lamb

    james.lamb

    2 years ago
    no prob! Honestly even just telling me "that sounds like a reasonable path even if it's non-ideal" is really helpful