im getting dask timeouts with my temporary clusters. im running dask in gke with prefect cloud, and im seeing a flow hang on
Copy code
Creating a new Dask cluster with `__main__.get_executor.<locals>.<lambda>`...`
in my gke logs, im seeing
Copy code
raise TimeoutError( asyncio.exceptions.TimeoutError: Nanny failed to start in 60 seconds
k
Kevin Kho
05/16/2022, 5:40 PM
This seems like a Dask specific issue, but I am not seeing much beyond this. Can the workers connect to the scheduler?
a
Andrew Lawlor
05/16/2022, 5:50 PM
im sure it is dask specific but i dont know much about dask. i think the issue is they cant connect to the scheduler if im reading the logs correctly.
i just created the cluster like so
so a new temporary cluster gets spun up with each flow, which seems to be the preferred way to run a flow with a dask executor. is there a recommended way to set up the temporary cluster?
k
Kevin Kho
05/16/2022, 5:55 PM
I’ve done the same on GKE autipilot before and it’s worked for me. I was following this. Do you have quotas that prevent you from spinning up more workers?
Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.