Hi - I am trying to follow the example in Dask Clo...
# prefect-community
Hi - I am trying to follow the example in Dask Cloud Providor. Not changing any code I get a timeout error.
Copy code
[2020-05-24 03:45:38] INFO - prefect.FlowRunner | Beginning Flow run for 'Dask Cloud Provider Test'
[2020-05-24 03:45:38] INFO - prefect.FlowRunner | Starting flow run.
[2020-05-24 03:45:48] ERROR - prefect.FlowRunner | Unexpected error: OSError("Timed out trying to connect to '<tcp://>' after 10 s: Timed out trying to connect to '<tcp://>' after 10 s: connect() didn't finish in time")
Traceback (most recent call last):
  File "miniconda3/envs/py37moc/lib/python3.7/site-packages/distributed/comm/core.py", line 232, in connect
  File "/miniconda3/envs/py37moc/lib/python3.7/site-packages/distributed/comm/core.py", line 213, in _raise
    raise IOError(msg)
OSError: Timed out trying to connect to '<tcp://>' after 10 s: connect() didn't finish in time
Is there a port to open? In the ECS console I do see a cluster generated and closed. https://docs.prefect.io/orchestration/execution/dask_cloud_provider_environment.html#process
Hi Itay, if you're running this from your local machine and trying to connect to the cluster
Copy code
is the internal IP. If you switch to
Copy code
in the flow.run that will connect on the external ip.
hey itay, did you ever resolve this? i am having an issue that seems pretty much identical to yours -- an ECS cluster is generated, a worker and scheduler start, and then this error occurs and everything shuts down. i'm using the default vpc/security group/etc, and it seems that the generated security group opens the relevant ports, so i'm not totally sure what's going on
for anyone who stumbles upon this in the future, the issue is exactly as kingsley described -- in the documentation, if you use the
directly and run
as described, you can use the cluster public IP via
. if you are trying to use the
, because the cluster is instantiated dynamically, i don't know that you can actually control that (i.e., trying to access
will just return
until execution begins and
is called). as far as i can tell, when using this environment, you need to have a machine in your
that can communicate with the cluster via the private IP. if you're just testing things locally and want to confirm this is your issue, you can go into the source code for
and change the line in the
Copy code
self.executor_kwargs["address"] = self.cluster.scheduler.address
Copy code
self.executor_kwargs["address"] = self.cluster.scheduler_address
and your flow should execute without issue. i'm a new prefect user, so i'm still a little shaky on the environment/executor/agent distinction and some of the above may be wrong, but hopefully this helps prevent others from wasting a few hours of their time like i did : )