API. running tutorials and sample code from
this works great, but for prefect I have created a dedicated docker image which I provide to the cluster initializer. The cluster gets created, but as soon as the flow starts, after 10 s I get the following time-out error:
Any ideas what is going on? Is the dask scheduler having issues connecting to the workers? Or might it be something else? Thanks!
Traceback (most recent call last): File "/usr/local/lib/python3.8/site-packages/prefect/engine/runner.py", line 48, in inner new_state = method(self, state, *args, **kwargs) File "/usr/local/lib/python3.8/site-packages/prefect/engine/flow_runner.py", line 418, in get_flow_run_state with self.check_for_cancellation(), executor.start(): File "/usr/local/lib/python3.8/contextlib.py", line 113, in __enter__ return next(self.gen) File "/usr/local/lib/python3.8/site-packages/prefect/executors/dask.py", line 203, in start with Client(self.address, **self.client_kwargs) as client: File "/usr/local/lib/python3.8/site-packages/distributed/client.py", line 748, in __init__ self.start(timeout=timeout) File "/usr/local/lib/python3.8/site-packages/distributed/client.py", line 953, in start sync(self.loop, self._start, **kwargs) File "/usr/local/lib/python3.8/site-packages/distributed/utils.py", line 340, in sync raise exc.with_traceback(tb) File "/usr/local/lib/python3.8/site-packages/distributed/utils.py", line 324, in f result = yield future File "/usr/local/lib/python3.8/site-packages/tornado/gen.py", line 762, in run value = future.result() File "/usr/local/lib/python3.8/site-packages/distributed/client.py", line 1043, in _start await self._ensure_connected(timeout=timeout) File "/usr/local/lib/python3.8/site-packages/distributed/client.py", line 1100, in _ensure_connected comm = await connect( File "/usr/local/lib/python3.8/site-packages/distributed/comm/core.py", line 308, in connect raise IOError( OSError: Timed out trying to connect to <tcp://172.31.40.184:8786> after 10 s [2021-01-19 20:14:19+0000] ERROR - prefect.Execute process | Unexpected error occured in FlowRunner: OSError('Timed out trying to connect to <tcp://172.31.40.184:8786> after 10 s') Traceback (most recent call last): File "s3_process_l2a_2019.py", line 114, in <module> assert status.is_successful() AssertionError