https://prefect.io logo
Title
j

Jason Bertman

10/05/2022, 4:54 PM
Hey all, just recently set up Orion in our EKS cluster to pilot switching over to RayTaskRunner. We were experiencing some pretty untenable memory bloat with high numbers of tasks with Dask (25K+). My question about Ray: does the local cluster capability feature the same worker adaption that Dask does? In my testing it seems to function basically the same as a concurrent executor (single pod, local scheduling). Am I missing something here? Or is this a case where we need to deploy a local Ray cluster to get that use case?
m

Mason Menges

10/05/2022, 4:57 PM
Without seeing the code it's hard to say 100% but are you calling .submit() on the tasks, the dask/ray task runners will only run asynchronously if .submit() is called on the task within the flow https://docs.prefect.io/tutorials/dask-ray-task-runners/?h=dask#dask-and-ray-task-runners
Otherwise it just runs synchronously
j

Jason Bertman

10/05/2022, 4:59 PM
yep using submit
are you saying that I should see additional pods spin up with workload?
it's certainly running async, it just won't scale past a certain point if more resources/workers aren't made available (as dask does)
m

Mason Menges

10/05/2022, 5:04 PM
Not necessarily, I am admittedly not as familiar with the Ray task runner so that was just an initial thought. I can dig around a bit and let you know what I turn up though
j

Jason Bertman

10/05/2022, 5:12 PM
gotcha, thanks in advance! I'll keep messing around on my end. It is "working", I'm not certain I see the benefit in doing the temp cluster without running in an actual remote capacity. But I'm also a Ray noob so 🤷
Maybe I'm after the kuberay operator? https://docs.ray.io/en/master/cluster/kubernetes/index.html And I could just point RayTaskRunner at that? looks like that's just the new version, ray autoscaling already existed
m

Mason Menges

10/05/2022, 5:34 PM
For starters I believe you should be able to pass in any kwargs tied to ray.init list here https://docs.ray.io/en/latest/ray-core/package-ref.html to specify the resources available to the cluster which might help. That definitely looks promising as well
j

Jason Bertman

10/05/2022, 5:44 PM
certainly seems like there needs to be an existing head node with the autoscaler running... perhaps it's safe to assume that state hasn't been reached with temporary clusters?
@Mason Menges just an update on this - I was able to successfully deploy a Ray autoscaling setup via kuberay then point a RayTaskRunner at the head node. Autoscaling works great long as resourcing is set up properly 👍 I wasn't able to get it to work with a temp cluster, but this covers my use case, so works for me.
m

Mason Menges

10/11/2022, 5:11 PM
Awesome That's great to hear 😄