Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.

Prefect Community

*Is there a way to have a Prefect GPU agent/worker on (temporary) sleep when it’s not used in Kubernetes?*

*Motivation*: let’s assume that the cost of a heavy GPU machine is $2k per day.
Let’s also assume that our Prefect deployment runs in AKS (managed Azure Kubernetes) and we have separate pods for `prefect-server` and a `prefect-agent`.
What if 90% of the day the `prefect-agent` (which is running on a GPU node on the cluster) is idle?

This means it’s underutilized and we waste money for no good reason.

*Reference*: Airflow provides <https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/executor/kubernetes.html|Kubernetes Executor> - on-demand/ad-hoc worker pods.

Since Prefect thought of everything - I’m sure there is either a built-in capability for that or a design pattern for achieving that.

Thanks!

Even I am interested to know the solution of this use case. This use case is very common in my opinion.

<@U03HEGMDTNW> we use prefect-ray with an existing ray cluster hosted in k8s, and support autoscaling for GPU nodes for cost-saving reasons. our node pool defaults to zero nodes but scales up as more gpu requests come in from tasks.

happy to provide more details, but if you're open to considering ray then its worked quite well for our use case, and integrates cleanly into prefect as well

<https://github.com/PrefectHQ/prefect-ray>

Thanks a lot <@U02R3RGR0HW>! What’s unique to Ray that makes it operational?

Isn’t that (GPU node scaling) supposed to be an *Executor Backend* (e.g. Ray/Dask) *agnostic* question?

It could very well work with other backends as well - we used to use dask but transitioned to ray. Could very well be backend-agnostic.

Our setup, which may be a little different from yours based on your description:
1. We run a prefect-agent pod - the agent pod is lightly resources (no gpu, low vcpu, etc)
2. We also run a ray cluster (see kuberay) that keeps one cpu worker alive at all times, and can autoscale cpu workers and gpu workers independently
3. When a flow run comes in, the agent uses the RayTaskExecutor from prefect-ray to run tasks on the ray cluster. Ray <https://docs.ray.io/en/latest/cluster/kubernetes/user-guides/configuring-autoscaling.html|autoscales> as needed based on resource demands from all flow runs that are running at any time.
This was the setup we came up with so that we could have the agent running all the time and be ready to scale compute resources (including GPU) up based on demand. It's working pretty well.