im getting dask timeouts with my temporary cluster...
# prefect-community
a
im getting dask timeouts with my temporary clusters. im running dask in gke with prefect cloud, and im seeing a flow hang on
Copy code
Creating a new Dask cluster with `__main__.get_executor.<locals>.<lambda>`...`
in my gke logs, im seeing
Copy code
raise TimeoutError( asyncio.exceptions.TimeoutError: Nanny failed to start in 60 seconds
k
This seems like a Dask specific issue, but I am not seeing much beyond this. Can the workers connect to the scheduler?
a
im sure it is dask specific but i dont know much about dask. i think the issue is they cant connect to the scheduler if im reading the logs correctly. i just created the cluster like so
Copy code
DaskExecutor(
      cluster_class=lambda: KubeCluster(make_pod_spec(image=prefect.context.image), env=env),
      adapt_kwargs={"minimum": 2, "maximum": 3},
    )
so a new temporary cluster gets spun up with each flow, which seems to be the preferred way to run a flow with a dask executor. is there a recommended way to set up the temporary cluster?
k
I’ve done the same on GKE autipilot before and it’s worked for me. I was following this. Do you have quotas that prevent you from spinning up more workers?
a
i also followed that but im not using autopilot bc i was seeing different issues https://prefect-community.slack.com/archives/CL09KU1K7/p1650566183898799 i dont think i have quotas set, but which ones should i be checking for?
k
Ah I see. I think for quotas, you could see pods failing to start?