https://prefect.io logo
#prefect-community
Title
# prefect-community
b

Brian McFeeley

07/31/2019, 9:45 PM
any ideas for some logging, etc. I could maybe turn on to get a better detail would be helpful! Otherwise, I'll keep digging and let you know if i figure it out
c

Chris White

07/31/2019, 11:50 PM
I honestly don’t have a good idea right now about what logs to add here --> could you simply try increasing the memory on the container? Alternatively, do your flows benefit from parallelism / distributed compute, or could you run them using the
LocalExecutor
?
I can reach out to some additional Dask folks and see if they have any experience with aptible that might be relevant here
b

Brian McFeeley

08/01/2019, 3:45 PM
the mystery is solved
i can now rest. lol
c

Chris White

08/01/2019, 3:45 PM
💯 🎉 💯 was it an aptible setting?
b

Brian McFeeley

08/01/2019, 3:47 PM
so, the issue is as I expected, nothing to do with prefect, but aptible's deploy environment. TLDR: • containers are only accessible through tcp load-balancers • I tried to get clever and open a bunch of ports on the shared docker image, and on the ELB, and set the entrypoint for the containers to be subtly different per-container that we scaled with each worker listening on a distinct port • i expected/hoped that the requests through the elb would either be routed directly to the right container (not a reasonable assumption), or they'd multicast to all the containers and whichever one was listening on that port would act and the others would ignore it • Actually, aptible will hear a request on the elb at port e.g. 1234, and randomly pick from the containers behind the elb and pass through on that same port
what happens in this architecture is, if you have very few containers, the elb "gets it right" frequently, and the issue hides. If you scale to say 8-10 workers, it's wrong 80-90% of the time, so the scheduler can't get in touch with the correct worker and assumes it is lost.
c

Chris White

08/01/2019, 3:49 PM
ahhhhh nice work!!
b

Brian McFeeley

08/01/2019, 3:49 PM
aptible is architected under the assumption that all containers will run identical processes and be a bit "dumber" than dask workers are expected to be
the "right" solution in this deployment environment is to create a separate elb for each port, which sounds messy, but "it works" ™️ . we may not run in aptible at all in the long term so it's good to know for now and we dont particularly care that deployment of the cluster is super hairy atm
thanks for your patience with me!
c

Chris White

08/01/2019, 3:50 PM
yea absolutely thanks for sharing the solution and digging deep on this!