I'm having trouble running a simple "hello world" ...
# ask-community
b
I'm having trouble running a simple "hello world" flow on a temporary dask
EC2Cluster
. I'm new to dask. The relevant part of my flow script is:
Copy code
flow.executor = DaskExecutor(
    cluster_class='dask_cloudprovider.aws.EC2Cluster',
    cluster_kwargs={'n_workers': 2, 'docker_image': 'prefecthq/prefect', 'debug': True}
)
flow.run()
This fails with
FlowRunner: ClientError('An error occurred (InvalidParameterValue) when calling the RunInstances operation: User data is limited to 16384 bytes')
. The issue is that the docker run command looks like this:
docker run --net=host   prefecthq/prefect env DASK_INTERNAL_INHERIT_CONFIG="a_very_very_long_string" python -m distributed.cli.dask_scheduler
, so I guess the command winds up being too long. Does anyone know a workaround for this issue?
j
This issue is unrelated to prefect itself (it's an issue with the
EC2Cluster
implementation), I suggest filing an issue in the
dask-cloudprovider
repo (https://github.com/dask/dask-cloudprovider) and see if they have any ideas there.
FWIW the non-vm cluster managers (e.g.
FargateCluster
) shouldn't have this issue.
b
looks like an open issue
👍 1
I get farther with a
FargateCluster
: it creates a cluster and a scheduler task. however, no worker tasks are created (I have
n_workers=2
) and produces the error
prefect.FlowRunner | Unexpected error: OSError('Timed out trying to connect to <tcp://54.186.22.123:8786> after 10 s')
is this also a dask bug?
j
Hmmm, that might have to do with your AWS configuration - you need to be able to connect to the dask scheduler running in fargate from your local client process. Prefect doesn't do anything special here, so the solution here is also likely related to
dask-cloudprovider
(but maybe not a bug, rather some setting/config you're missing). Apologies that I'm not more helpful here.
b
no problem, I appreciate your help!