https://prefect.io logo
r

Robin

10/01/2020, 2:48 PM
Dear all, I just now got the following error:
Copy code
Unexpected error: OSError("Timed out trying to connect to '<tcp://10.100.0.100:44991>' after 10 s: Timed out trying to connect to '<tcp://10.100.0.100:44991>' after 10 s: connect() didn't finish in time")
Is this a known issue? The environment is a daskkubernetes cluster running on AWS EKS...
✔️ 1
n

nicholas

10/01/2020, 2:50 PM
Hi @Robin ! Can you provide a little more context to this error? Where did this show up etc
r

Robin

10/01/2020, 2:56 PM
It shows up when trying to run a flow:
Copy code
Submitted for execution: Job prefect-job-5bf53878

Adaptive scaling started: minimum=100 maximum=100

Beginning Flow run
Then, the error message shows up
Afterwards, the dask pods are deleted
Copy code
Deleted pod: dask-root-cec0f0d6-6zbxsc
and the flow is restarted:
Copy code
backend_accure_net restarted this flow run
n

nicholas

10/01/2020, 3:00 PM
Got it - is this running against Prefect Cloud or Server?
r

Robin

10/01/2020, 3:00 PM
Cloud
n

nicholas

10/01/2020, 3:03 PM
Thanks @Robin - this looks like an issue with your Dask cluster; can you confirm that your cluster is up and healthy in those situations?
r

Robin

10/01/2020, 3:36 PM
Oh, I abused the number 1 rule of debugging:
Have you tried turning it off and on again?
Uploading the flow again solved the problem...
n

nicholas

10/01/2020, 3:37 PM
😄 no worries! Let me know if you run into that again
r

Robin

10/01/2020, 4:22 PM
Will do 🙂