Hi folks, I have a question about dask adpative cl...
# ask-community
r
Hi folks, I have a question about dask adpative cluster and prefect. We have a task which is mapped, and each of these mapped tasks is quite long (~10min), and intense in resources leading the dask adaptive cluster to scale up. But because it takes some time for the new workers to be available (K8s has to allocate some new nodes ....), basically all the mapped tasks are given to the first exisiting worker, and they are not redistributed while the new workers are finally available. Have anyone encountered such an issue, and how have you dealt with?
k
Hey @Romain, are these being spun up with the Prefect Executor? Or is the cluster handling it without Prefect knowing about it?
r
by prefect
a temporary cluster
k
Could you show me a sample code snippet?
r
Copy code
executor = DaskExecutor(
            cluster_class='dask_kubernetes.KubeCluster',
            cluster_kwargs={'pod_template': 'dask_worker_pod_template.yaml'),
                            'scheduler_pod_template': scheduler_pod_template,
                            'namespace': 'emtrails',
                            'n_workers': 1,
                            },
            adapt_kwargs={'minimum': 1,
                          'maximum': 10}
        )
flow.executor = executor
And then it is run from prefect server using a K8s agent
k
Ok I’ll ask someone with more experience than me on the team and get back to you
👍 1
c
@Romain Could you try running it without
n_workers = 1
?
I don't think you need that if you've set adaptive scaling
r
@ciaran I could. But just to be clear, the number of workers scaled up. Just that they did not get any tasks
c
I certainly don't have it set on ours and we see the tasks get spread (or at least from what I can tell)
I'm wondering if because you have it set to one, it will only actually run tasks on one, but it will still adaptively scale
Sound silly, but it might be 🤣
r
Ok I ll give it a try and let you know. Thanks
c
No worries, it's a shot in the dark I'll admit
r
Oh actually, I found the issue. I still had the env variable
Copy code
DASK_DISTRIBUTED__SCHEDULER__WORK_STEALING: "False"
If I recall correctly, in the early time of prefect, it was recommended to set it. I removed it and now the work is properly distributed.
👍 2
My bad
k
Thanks for circling back on this
c
Glad it's working 😄