https://prefect.io logo
Title
b

Brian McFeeley

07/29/2019, 9:45 PM
Running into a bit of an inscrutable error that i strongly suspect is not really prefect's issue, but rather an issue with my dask cluster setup. When submitting a flow run to my cluster, I see these warnings repeatedly:
distributed.client - WARNING - Couldn't gather 4 keys, rescheduling {'extract-2e6b9db4-5567-486a-af2f-8ad07bc6a77b': ('<tcp://172.17.0.2:43849>',), 'transform2-b5e3d1d0-fb14-44f6-afe6-6b47ec5ab277': ('<tcp://172.17.0.2:43849>',), 'transform1-8156dfd1-1a53-4213-bf66-c70d1aab724f': ('<tcp://172.17.0.2:43849>',), 'load-fa2ae950-213e-4044-9773-fdacd7b057b4': ('<tcp://172.17.0.2:43849>',)}
I'm trying to deploy my cluster in aptible, and I suspect i've not set up enough plumbing on exposing the right ports so the scheduler and worker processes can communicate. Does this ring any bells?
c

Chris White

07/29/2019, 9:50 PM
I haven’t seen this particular error before but will definitely look into it! Your hypothesis about ports not being exposed properly does sound like a likely candidate; it looks like the scheduler was able to submit to the workers but couldn’t gather the results from the workers 🧐
b

Brian McFeeley

07/29/2019, 9:52 PM
i am on the trail myself, will let you know if i figure it out
👍 1
Just a heads up, i think this is strictly an issue with aptible deploy environment being idiosyncratic here. Aptible does some magic under the hood to map hostnames, and containers are only accessible via ELB endpoints, so I was having some trouble starting up the scheduler/worker in such a way that they could see each other effectively. By starting the worker with something like
dask-worker --listen-address <tcp://0.0.0.0:8788> --contact-address tcp://<Worker_ELB_hostname>:8788 <Scheduler_ELB_hostname>:8786
I was able to get the desired plumbing in place
c

Chris White

07/30/2019, 6:35 PM
Ah awesome, this is really good to know - thanks @Brian McFeeley!
@Marvin archive “Issue with running prefect on dask + aptible”