Hi - I am trying to follow the example in Dask Clo...
# prefect-community
i
Hi - I am trying to follow the example in Dask Cloud Providor. Not changing any code I get a timeout error.
Copy code
[2020-05-24 03:45:38] INFO - prefect.FlowRunner | Beginning Flow run for 'Dask Cloud Provider Test'
[2020-05-24 03:45:38] INFO - prefect.FlowRunner | Starting flow run.
[2020-05-24 03:45:48] ERROR - prefect.FlowRunner | Unexpected error: OSError("Timed out trying to connect to '<tcp://172.31.44.64:8786>' after 10 s: Timed out trying to connect to '<tcp://172.31.44.64:8786>' after 10 s: connect() didn't finish in time")
Traceback (most recent call last):
  File "miniconda3/envs/py37moc/lib/python3.7/site-packages/distributed/comm/core.py", line 232, in connect
    _raise(error)
  File "/miniconda3/envs/py37moc/lib/python3.7/site-packages/distributed/comm/core.py", line 213, in _raise
    raise IOError(msg)
OSError: Timed out trying to connect to '<tcp://172.31.44.64:8786>' after 10 s: connect() didn't finish in time
Is there a port to open? In the ECS console I do see a cluster generated and closed. https://docs.prefect.io/orchestration/execution/dask_cloud_provider_environment.html#process
k
Hi Itay, if you're running this from your local machine and trying to connect to the cluster
Copy code
cluster.scheduler.address
is the internal IP. If you switch to
Copy code
cluster.scheduler_address
in the flow.run that will connect on the external ip.
m
hey itay, did you ever resolve this? i am having an issue that seems pretty much identical to yours -- an ECS cluster is generated, a worker and scheduler start, and then this error occurs and everything shuts down. i'm using the default vpc/security group/etc, and it seems that the generated security group opens the relevant ports, so i'm not totally sure what's going on
for anyone who stumbles upon this in the future, the issue is exactly as kingsley described -- in the documentation, if you use the
FargateCluster
directly and run
flow.execute(...)
as described, you can use the cluster public IP via
cluster.scheduler_address
. if you are trying to use the
DaskCloudProviderEnvironment
, because the cluster is instantiated dynamically, i don't know that you can actually control that (i.e., trying to access
dask_cloud_provider_instance.cluster
will just return
None
until execution begins and
_create_dask_cluster
is called). as far as i can tell, when using this environment, you need to have a machine in your
VPC
that can communicate with the cluster via the private IP. if you're just testing things locally and want to confirm this is your issue, you can go into the source code for
cloud_provider.py
and change the line in the
_create_dask_cluster
method
Copy code
self.executor_kwargs["address"] = self.cluster.scheduler.address
to
Copy code
self.executor_kwargs["address"] = self.cluster.scheduler_address
and your flow should execute without issue. i'm a new prefect user, so i'm still a little shaky on the environment/executor/agent distinction and some of the above may be wrong, but hopefully this helps prevent others from wasting a few hours of their time like i did : )