Thread
#prefect-community
    t

    Tomer Cagan

    6 months ago
    Hi, A general question about how dask is used under the hood. In "regular" dask code there's usually a client (script or from REPL) which connects to a cluster (ephemeral or long-running) and then it can submit work to that cluster. How this is done in Prefect? Assuming I am using DaskExecutor and running on k8s with kubernetes agent - does the client code run in the agents itself? As part of the KubeCluster? Where should I look for the code running it - k8s agent?
    Anna Geller

    Anna Geller

    6 months ago
    Oh this is actually a common question, let me write this up on Discourse - I'll share the link soon
    t

    Tomer Cagan

    6 months ago
    Thanks for this! In the discord link you sent, the first item "Prefect initializes the Dask client" - I assume Prefect here is referring to the agent that runs the flow? So if I am considering using DaskExecutor with a k8s agent, the "client" code would be running in the agent, and in case I handle large data, it can potentially affect the agent? Do you know if it is possible to use dask's scatter (as in here) to keep the data in dask?
    Anna Geller

    Anna Geller

    6 months ago
    if I am considering using DaskExecutor with a k8s agent, the "client" code would be running in the agent
    on K8s it depends ā€¢ if you use LocalDaskExecutor, all execution happens within flow run pod running independently of Kubernetes agent's pod ā€¢ if you run DaskExecutor e.g. with KubeCluster, then it runs on the Dask cluster spun up o K8s (a full cluster with many pods, likely across multiple distributed compute nodes) - regarding KubeCluster, check out this blog post
    Do you know if it is possible to use dask's scatter (as in here) to keep the data in dask?
    For such Dask questions, I recommend posting it on Dask Discourse
    t

    Tomer Cagan

    6 months ago
    Regarding dask/client - when running k8s, it is using KubernetesRun and the "client" code actually runs in that pod - then, in case of local cluster, the dask worker runs on the same resources, and in case of KubeCluster, it runs on additional resources created for the KubeCluster dask cluster. Did I get you right? As for the scatter - I wonder if they would know since Prefect is handling the orchestration. I guess I could give it a try.
    Anna Geller

    Anna Geller

    6 months ago
    yup, you're correct! ā€¢ LocalDaskExecutor = all task runs are executed within the flow run pod ā€¢ DaskExecutor(cluster_class=KubeCluster()) = flow run is executed within the flow run pod, but task runs are executed on a distributed Kube Dask cluster across many pods/nodes
    t

    Tomer Cagan

    6 months ago
    Maybe add this to the discourse article you made? I think that reading it explicitly would have helped me understanding what's going on faster... šŸ™‚