https://prefect.io logo
v

Vincent

01/11/2021, 2:24 PM
I want to ask the Dask Experts whether they can help identify a scheduler bottleneck I have experienced. I have a flow that schedules 22 parallel mapped jobs (each with 65 items) allowing for 1430 parallel jobs. When I schedule the job on an k8s cluster with 375 nodes (each with a worker that runs 4 threads), my scheduler throws messages out like the below. The CPU utilization of the scheduler also goes to 100%. These jobs ultimately die when all the workers get disconnected.
Copy code
distributed.utils_perf - WARNING - full garbage collections took 10% CPU time recently (threshold: 10%)
...
distributed.core - INFO - Event loop was unresponsive in Scheduler for 7.07s.  This is often caused by long-running GIL-holding functions or moving large chunks of data. This can cause timeouts and instability.
...
distributed.comm.tcp - INFO - Connection closed before handshake completed
...
distributed.scheduler - INFO - Close client connection: Client-worker-3c6d8642-53b5-11eb-800e-32b98c347770
When I scale the job down to 250 nodes and 3 threads per worker, I still get 100% utilization, but it is slightly more stable. where the only warning messages is
Copy code
distributed.utils_perf - WARNING - full garbage collections took 11% CPU time recently (threshold: 10%)
Thanks for any advice!
m

Marwan Sarieddine

01/11/2021, 2:27 PM
did you try increasing the resources (CPU and memory) allocated for the scheduler ?
v

Vincent

01/11/2021, 2:32 PM
I set no limits on the scheduler. It seems to plateau at 1. Same with memory - no limits.
d

Dylan

01/11/2021, 3:11 PM
Hi @Vincent! Would you mind sharing your cluster configuration & Flow run config + storage?
v

Vincent

01/11/2021, 3:15 PM
Hi @Dylan I am running k8s on AWS using the DaskKubernetesEnvironment (from prefect 0.13.17) Running with DockerStorage. What configs are you interested in particular?
d

Dylan

01/11/2021, 3:17 PM
I believe that users DaskKubernetes under the hood: https://kubernetes.dask.org/en/latest/kubecluster.html
There may be a way to specify an increase in scheduler resources there
Just in case you’re not aware, that Environment has been deprecated in favor of a new pattern for configuring Flow resources called Run Configs. More on that here: https://docs.prefect.io/orchestration/flow_config/overview.html
m

Marwan Sarieddine

01/11/2021, 3:21 PM
@Vincent you'll want to increase the resource spec by passing in a custom
scheduler_spec_file
to the
DaskKubernetesEnvironment
upvote 1
v

Vincent

01/11/2021, 3:21 PM
Thanks - I am aware of the changes, but as you mentioned, it still uses DaskKubernestes under the hood (with the KubeCluster) . I have not found any ways to get the scheduler to use more than 1 cpu.
m

Marwan Sarieddine

01/11/2021, 3:22 PM
this https://github.com/PrefectHQ/prefect/blob/master/src/prefect/environments/execution/dask/job.yaml is the default spec file that is being used with 100mcpu as a default
d

Dylan

01/11/2021, 3:23 PM
☝️ Marwan is right on the money 💯
v

Vincent

01/11/2021, 3:23 PM
Thanks @Marwan Sarieddine I set this to 1000 and removed the limits. It should be capable to scale past 1 but it does not.
I think the scheduler is single threaded.
m

Marwan Sarieddine

01/11/2021, 3:24 PM
hmm strange - I don't think it is
v

Vincent

01/11/2021, 11:40 PM
Just as a followup - I upgrade my setup to use 0.14.2 for the submission. Overall, I have noticed increased stability, when spawning the tasks. However, the scheduler is killed (with no error message) after ~ 10 minutes. =(
m

Marwan Sarieddine

01/12/2021, 1:45 AM
I see - perhaps someone from the prefect team can shed some more light here - sorry out of ideas ...
just for clarification - when you say "no error message" - does that mean the flow gets stuck in a running state ?
v

Vincent

01/12/2021, 2:12 PM
Yes that is correct - The job exits on the k8s cluster, and the job continues to run in the cloud service.
m

Marwan Sarieddine

01/12/2021, 7:49 PM
nothing in the kubernetes logs of the scheduler pod that point to any errors ? if you are using a cluster autoscaler make sure to annotate the scheduler and worker pods as not safe to evict
v

Vincent

01/12/2021, 8:21 PM
Thanks for the advice! I think it may have been an error on my side where I did not annotate the scheduler with any memory requests. I assumed that leaving it out would give it "unlimited" memory but I guess it just allowed the k8s cluster to move it as it wished.
👍 2
m

Marwan Sarieddine

01/12/2021, 8:25 PM
glad to know you got it fixed
37 Views