Hi again! I'm trying to run a process over a Farg...
# ask-community
m
Hi again! I'm trying to run a process over a Fargate cluster using
dask_cloudprovider
API. running tutorials and sample code from
dask
this works great, but for prefect I have created a dedicated docker image which I provide to the cluster initializer. The cluster gets created, but as soon as the flow starts, after 10 s I get the following time-out error:
Copy code
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/prefect/engine/runner.py", line 48, in inner
    new_state = method(self, state, *args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/prefect/engine/flow_runner.py", line 418, in get_flow_run_state
    with self.check_for_cancellation(), executor.start():
  File "/usr/local/lib/python3.8/contextlib.py", line 113, in __enter__
    return next(self.gen)
  File "/usr/local/lib/python3.8/site-packages/prefect/executors/dask.py", line 203, in start
    with Client(self.address, **self.client_kwargs) as client:
  File "/usr/local/lib/python3.8/site-packages/distributed/client.py", line 748, in __init__
    self.start(timeout=timeout)
  File "/usr/local/lib/python3.8/site-packages/distributed/client.py", line 953, in start
    sync(self.loop, self._start, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/distributed/utils.py", line 340, in sync
    raise exc.with_traceback(tb)
  File "/usr/local/lib/python3.8/site-packages/distributed/utils.py", line 324, in f
    result[0] = yield future
  File "/usr/local/lib/python3.8/site-packages/tornado/gen.py", line 762, in run
    value = future.result()
  File "/usr/local/lib/python3.8/site-packages/distributed/client.py", line 1043, in _start
    await self._ensure_connected(timeout=timeout)
  File "/usr/local/lib/python3.8/site-packages/distributed/client.py", line 1100, in _ensure_connected
    comm = await connect(
  File "/usr/local/lib/python3.8/site-packages/distributed/comm/core.py", line 308, in connect
    raise IOError(
OSError: Timed out trying to connect to <tcp://172.31.40.184:8786> after 10 s
[2021-01-19 20:14:19+0000] ERROR - prefect.Execute process | Unexpected error occured in FlowRunner: OSError('Timed out trying to connect to <tcp://172.31.40.184:8786> after 10 s')
Traceback (most recent call last):
  File "s3_process_l2a_2019.py", line 114, in <module>
    assert status.is_successful()
AssertionError
Any ideas what is going on? Is the dask scheduler having issues connecting to the workers? Or might it be something else? Thanks!
j
Hi @Matic Lubej I can’t say for certain based on the information you provided but it looks like wherever your flow is running it cannot communicate with your Dask scheduler
m
It turned out hat I had to add the security information to the dask executor, info provided in https://docs.prefect.io/core/advanced_tutorials/dask-cluster.html#an-example-flow Thanks!