Matic Lubej
01/19/2021, 8:18 PMdask_cloudprovider
API. running tutorials and sample code from dask
this works great, but for prefect I have created a dedicated docker image which I provide to the cluster initializer. The cluster gets created, but as soon as the flow starts, after 10 s I get the following time-out error:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/prefect/engine/runner.py", line 48, in inner
new_state = method(self, state, *args, **kwargs)
File "/usr/local/lib/python3.8/site-packages/prefect/engine/flow_runner.py", line 418, in get_flow_run_state
with self.check_for_cancellation(), executor.start():
File "/usr/local/lib/python3.8/contextlib.py", line 113, in __enter__
return next(self.gen)
File "/usr/local/lib/python3.8/site-packages/prefect/executors/dask.py", line 203, in start
with Client(self.address, **self.client_kwargs) as client:
File "/usr/local/lib/python3.8/site-packages/distributed/client.py", line 748, in __init__
self.start(timeout=timeout)
File "/usr/local/lib/python3.8/site-packages/distributed/client.py", line 953, in start
sync(self.loop, self._start, **kwargs)
File "/usr/local/lib/python3.8/site-packages/distributed/utils.py", line 340, in sync
raise exc.with_traceback(tb)
File "/usr/local/lib/python3.8/site-packages/distributed/utils.py", line 324, in f
result[0] = yield future
File "/usr/local/lib/python3.8/site-packages/tornado/gen.py", line 762, in run
value = future.result()
File "/usr/local/lib/python3.8/site-packages/distributed/client.py", line 1043, in _start
await self._ensure_connected(timeout=timeout)
File "/usr/local/lib/python3.8/site-packages/distributed/client.py", line 1100, in _ensure_connected
comm = await connect(
File "/usr/local/lib/python3.8/site-packages/distributed/comm/core.py", line 308, in connect
raise IOError(
OSError: Timed out trying to connect to <tcp://172.31.40.184:8786> after 10 s
[2021-01-19 20:14:19+0000] ERROR - prefect.Execute process | Unexpected error occured in FlowRunner: OSError('Timed out trying to connect to <tcp://172.31.40.184:8786> after 10 s')
Traceback (most recent call last):
File "s3_process_l2a_2019.py", line 114, in <module>
assert status.is_successful()
AssertionError
Any ideas what is going on? Is the dask scheduler having issues connecting to the workers? Or might it be something else?
Thanks!josh
01/19/2021, 8:53 PMMatic Lubej
01/20/2021, 9:41 PM