Running into a bit of an inscrutable error that i ...
# prefect-community
b
Running into a bit of an inscrutable error that i strongly suspect is not really prefect's issue, but rather an issue with my dask cluster setup. When submitting a flow run to my cluster, I see these warnings repeatedly:
Copy code
distributed.client - WARNING - Couldn't gather 4 keys, rescheduling {'extract-2e6b9db4-5567-486a-af2f-8ad07bc6a77b': ('<tcp://172.17.0.2:43849>',), 'transform2-b5e3d1d0-fb14-44f6-afe6-6b47ec5ab277': ('<tcp://172.17.0.2:43849>',), 'transform1-8156dfd1-1a53-4213-bf66-c70d1aab724f': ('<tcp://172.17.0.2:43849>',), 'load-fa2ae950-213e-4044-9773-fdacd7b057b4': ('<tcp://172.17.0.2:43849>',)}
I'm trying to deploy my cluster in aptible, and I suspect i've not set up enough plumbing on exposing the right ports so the scheduler and worker processes can communicate. Does this ring any bells?
c
I haven’t seen this particular error before but will definitely look into it! Your hypothesis about ports not being exposed properly does sound like a likely candidate; it looks like the scheduler was able to submit to the workers but couldn’t gather the results from the workers 🧐
b
i am on the trail myself, will let you know if i figure it out
👍 1
Just a heads up, i think this is strictly an issue with aptible deploy environment being idiosyncratic here. Aptible does some magic under the hood to map hostnames, and containers are only accessible via ELB endpoints, so I was having some trouble starting up the scheduler/worker in such a way that they could see each other effectively. By starting the worker with something like
Copy code
dask-worker --listen-address <tcp://0.0.0.0:8788> --contact-address tcp://<Worker_ELB_hostname>:8788 <Scheduler_ELB_hostname>:8786
I was able to get the desired plumbing in place
c
Ah awesome, this is really good to know - thanks @Brian McFeeley!
@Marvin archive “Issue with running prefect on dask + aptible”