Severin Ryberg [sevberg]
12/18/2020, 9:35 PMMarwan Sarieddine
12/18/2020, 9:47 PMscheduler_spec_file
in case you are using a DaskKubernetesEnvironment
)Severin Ryberg [sevberg]
12/18/2020, 10:05 PMMarwan Sarieddine
12/18/2020, 11:08 PMSeverin Ryberg [sevberg]
12/19/2020, 12:00 PMMarwan Sarieddine
12/19/2020, 3:34 PMSetting the job timeout to 10 minutes (far longer than the 2-5 minutes it takes the overall flow to fail)
did you do so by setting this envirnoment variable:
"DASK_DISTRIBUTED__COMM__TIMEOUTS__CONNECT": "600s",
?
I am using the cluster autoscaler without any issues - but perhaps just because I am not encountering the edge case described in the issue you referencedfailed to connect after 600s ?