https://prefect.io logo
o

Ofir

08/08/2023, 7:08 PM
Is there a way to have a Prefect GPU agent/worker on (temporary) sleep when it’s not used in Kubernetes? Motivation: let’s assume that the cost of a heavy GPU machine is $2k per day. Let’s also assume that our Prefect deployment runs in AKS (managed Azure Kubernetes) and we have separate pods for
prefect-server
and a
prefect-agent
. What if 90% of the day the
prefect-agent
(which is running on a GPU node on the cluster) is idle? This means it’s underutilized and we waste money for no good reason. Reference: Airflow provides Kubernetes Executor - on-demand/ad-hoc worker pods. Since Prefect thought of everything - I’m sure there is either a built-in capability for that or a design pattern for achieving that. Thanks!
h

Henning Holgersen

08/08/2023, 7:31 PM
I’m not an expert, but my experience is that AKS pools have trouble scaling down to 0 again. I think I read something about that as well, about auto scaling only going down to 1. It worked for me for a while though, until one day I realized my beefy pool had been running for weeks 💸 . I would try to use ACI for this type of case.
o

Ofir

08/08/2023, 7:37 PM
Thanks Henning! Sorry for the ignorance but what’s ACI and how do I use it?
h

Henning Holgersen

08/08/2023, 7:41 PM
Azure Container Instances, basically running containers without AKS. Startup time is a little long, but compared to scaling an AKS pool it is quite OK. I made a little how-to a while back if you want to check it out: https://github.com/radbrt/prefect_aci
🙌 1
o

Ofir

08/08/2023, 7:45 PM
What if Prefect could piggyback on my application AKS cluster? In this case I’d have at least 1 Pod running. Then I could integrate with
keda.sh
auto-scaler and cheat by scaling up only the Prefect GPU worker node based on a trigger.
The trigger could come from a prerequisite Prefect deployment (flow run) that runs prior to the GPU-heavy workloads. It could send a Webhook (HTTP POST) to some
keda.sh
auto-scaler endpoint and that would scale up the cluster and adding the GPU node that runs the Prefect GPU-intensive deployment.
Essentially (ab)using Prefect as a poor-mans resource orchestrator.
h

Henning Holgersen

08/08/2023, 7:50 PM
Hmmm… this time it is my turn to learn stuff - I hav just barely heard of keda. Scaling up will not be a problem, but maybe some automation to make sure the node pool is scaled down to 0 after the flow ends?
o

Ofir

08/08/2023, 7:50 PM
Yeah for sure, this could be a post-Prefect deployment or a try-catch-finally that eventually scales down the cluster to its previous state.
This could also be a cronjob or an RRule that checks a global reference count (bigger than 0) just in case the post-processing Prefect deployment never had the chance to run.
Very hacky and anything but native, wondering if I’m entertaining myself with nonsense or this could actually work.
Concurrency would be an issue and other edge/corner cases that I haven’t taken into consideration, so it’s just me thinking out loud here
h

Henning Holgersen

08/08/2023, 7:55 PM
You mentioned this is a thing in Airflow? Do we know if it can scale to 0 in AKS? Because re-checking how AKS can do scaling would be a good check before the hacky stuff. And if ACI is an option, that would be my second choice. After that, some automation or regular check that stuff isn’t running amok.
o

Ofir

08/08/2023, 7:58 PM
In Airflow you have Kubernetes Executor which provides you with a one-off ad-hoc Pods which seem to fit the bill here.
The most relevant Kubernetes object for this purpose is actually a Kubernetes Job with completions == 1.
But again the Prefect server would have to be dynamic enough and aware of these dynamic workers/agents, so to me it feels like sneaking behind Prefect’s server back in doing so.
h

Henning Holgersen

08/08/2023, 8:02 PM
A one-off pod is one thing, when it finishes it goes away - as does normal prefect stuff after the ttl has expired… but what happens to the node pool is another thing. Am I misunderstanding something?
o

Ofir

08/08/2023, 8:05 PM
So the node pool can stay fixed MOST of the time, e.g. 2 nodes for your 3-tier web app.
And it will scale up to 3 nodes: 2 (regular) nodes and 1 special (GPU) node, on-demand.
Actually I think I understand what you’re asking and it’s a valid question. It’s not enough to just spin-off a one-time Pod, you need the Kubernetes Worker Node in place.
h

Henning Holgersen

08/08/2023, 8:06 PM
I thought all nodes in a pool had to be the same type?
o

Ofir

08/08/2023, 8:07 PM
I’m not sure if Airflow is taking care of that behind the scenes, should probably dig into the documentation or try it myself. I’m willing to bet that this is not just a Airflow/Prefect problem but a generic FinOps problem of: How do I pay as I go for GPU/resource intensive workloads on my heterogenous Kubernetes Cluster?
After all, that’s the whole point and the claim to fame of the cloud right? Scale up scale down
AFAICT there are 2 (possible) solutions: 1. Delegation - like what you did with ACI 2. On-demand resource scaling - within the container orchestrator (Kubernetes)
h

Henning Holgersen

08/08/2023, 8:08 PM
Generally, a lot of cloud stuff scales, but not to 0. And azure would probably give ACI as a generic answer.
o

Ofir

08/08/2023, 8:09 PM
Having said that, Prefect, which is a workflow orchestrator and not a resource orchestrator, needs to be resource scaling friendly should the nodes scale up/down.
I’m OK with not scaling to 0, assuming that I can piggyback both my Web Application and Prefect on the same Kubernetes cluster.
(that’s what I already do today anyhow)
h

Henning Holgersen

08/08/2023, 8:11 PM
Does your web app use a GPU? If so, I think you are good. Normal scaling works fairly well, within a node pool.
o

Ofir

08/08/2023, 8:12 PM
It does, we have a React frontend UI with Node.js as backend, and a separate data science engine for processing data science workloads.
@Henning Holgersen I really appreciate the time you took to actually document that step-by-step in your GitHub repo
I’ll have to learn some more Azure terms such as Azure Deployment and ACI. I’m taking the AZ-900 course to beef up my Azure knowledge and feel comfortable with the different Azure services.
Will give your Delegation solution a try and holler you for small questions if that’s OK 🙂
👍 1
c

Christopher Boyd

08/09/2023, 6:35 PM
I’ve used GPu scaling nodepools that scale to zero fine, just needs a bit more configuration as you would want to ensure taints and tolerations to prevent extra work from being scheduled on that node when it shouldn’t be
So you’d need to add some labels to your nodes , and taints / tolerations to limit what uses it and scales back down . I haven’t really had any issues scaling to zero myself
👍 1
👀 1
o

Ofir

08/09/2023, 6:39 PM
Hi @Christopher Boyd thanks for the response! Could you provide links to code / documentation so I can get started?
o

Ofir

08/09/2023, 6:49 PM
Thanks! Does Prefect server cope with that?
Some agents will be periodically down. Will Prefect Server send health checks / probes and crash if the workers are unavailable or does it take into account an intermittent available node?
And also, what is the hint/trigger to scale up or down the GPU node (that runs the prefect agent)? Does it have native integration to Prefect or do I need an extra Prefect agent that will trigger the scaling up?
c

Christopher Boyd

08/09/2023, 7:02 PM
This is mostly a Kubernetes construct not a prefect one . The server / health checks have no relevance to how your jobs get scheduled based on selectors
You’d have a gpu backed nodepool , and labels / selectors to schedule work to it
That would go into your job spec, so the jobs schedule to the right nodepools
o

Ofir

08/09/2023, 10:06 PM
I just fail to see the big picture and the flow
If I want to use Prefect on my Kubernetes cluster and the Prefect server is running on the k8s cluster and the Prefect agent is running on the GPU node pool on the same k8s cluster then there has to be a mechanism to trigger / cause the scaling up/down
I fail to understand how the Prefect agent running in the GPU node will become alive and get shut down
c

Christopher Boyd

08/09/2023, 10:49 PM
The agent doesn’t have to run on the nodepools with the gpu
The agent can run where ever, submitting a job to trigger a scale up is a submission to the kube api
The agent runs on nodepool A which is not gpu based. You start a flow with a node selector and toleration that needs a GPU. The job gets submitted , and the cluster scheduler spins up a node in nodepool B with a gpu
This would be how it would work without prefect, with Prefect you just specify the additional configurations in the job spec or work pool depending on your deployment pattern
o

Ofir

08/11/2023, 1:00 PM
The agent can run where ever, submitting a job to trigger a scale up is a submission to the kube api
I think that was missing for me, thanks for the clarification! Does it mean that my Prefect deployment now needs to submit a job and interact with the Kubernetes API server? How does the agent on nodepool A shich is not gpu based, passes the job to a node within nodepool B which IS GPU based?
It is a little bit involved to do this orchestration and scaling up/down safely. My open concerns: 1. Separation of concerns - do I need to write resource orchestration code (interacting with kube api server) within my Prefect deployments? a. This makes my Prefect deployments have to deal with both “workflow as code” (my business logic) and resource orchestration code, which I think is suboptimal b. Also, who is responsible for monitoring the Kubernetes API server to know when the GPU node was provisioned and now up? do I now need to monitor the live pods? 2. Recovery - who is responsible for ensuring the scaling down of the pod once it’s no longer used / it crashed in the middle? 3. Concurrency / parallelism - what if you have more than one job running in parallel? how do you coordinate/synchronize these efforts such that they don’t step on each other?
There has to be an API to call/delegate tasks from the Prefect deployments running on the regular CPU node pool A to the Prefect deployments running on the special GPU node pool B. And to me it’s not clear how the RPC or the delegation happens.
c

Christopher Boyd

08/11/2023, 6:33 PM
Hi Ofir - just to restate, this is a kubernetes concept. A job in one nodepool can submit work to any nodepool in the cluster. The kube scheduler is responsible for determining where it goes, not prefect. In normal practice, this happens based on measurements like resource availability and utilization. With more specific requirements such as instance types, this happens based on node selectors, taints, tolerations and labels. This would all be configured before prefect ever enters the picture here - you would just include the requirements that you need into the job spec that gets submitted
Prefect just submits a job to the cluster - the cluster is responsible for how scheduling happens
If you want to run a worker on a t2 micro instance with no requirements, then you can do that. If you want another nodepool with a t8.large and a gpu instance, the worker doesn’t have to run there - you’d just add a label to it, and a nodeSelector in your job, so when the job gets scheduled kubernetes knows where its supposed to go
I would recommend reading this for more details : https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/
o

Ofir

08/11/2023, 6:36 PM
Thanks
When you say Prefect just submits a job to the cluster you mean that the Prefect deployment interacts with the Kubernetes API server to do that?
How do you marshal / pass the parameters from the Prefect agent to the Kubernetes ad-hoc job?
h

Henning Holgersen

08/11/2023, 6:42 PM
The work pool is able to define some of the k8s job spec, and some can be defined in the deployment itself. I did a similar but different write up that touches on this here: https://discourse.prefect.io/t/use-aks-workload-identity/3354, maybe that gives you some ideas.
🙌 1
Another similar but different example for inspiration is in this thread: https://prefect-community.slack.com/archives/C048SVCEFF0/p1691019917870129
o

Ofir

08/11/2023, 6:44 PM
Thanks @Henning Holgersen I’ll take a look at both
I’ll see if I can come up with a fully working hello world example
To piece everything together. The devil is always in the details
c

Christopher Boyd

08/11/2023, 6:50 PM
Here’s what a sample job spec looks like:
Copy code
{
  "kind": "Job",
  "spec": {
    "template": {
      "spec": {
        "containers": [
          {
            "env": [],
            "name": "prefect-job"
          }
        ],
        "completions": 1,
        "parallelism": 1,
        "tolerations": [
          {
            "key": "prefect",
            "value": f"{DEPLOY_TYPE}",
            "effect": "NoSchedule",
            "operator": "Equal"
          }
        ],
        "nodeSelector": {
          "prefect": f"{DEPLOY_TYPE}"
        },
        "restartPolicy": "Never"
      }
    }
  },
  "metadata": {
    "labels": {}
  },
  "apiVersion": "batch/v1"
}
with a node selector and toleration
I have labels and taints applied to the nodes in my nodepool - therefore, what I pass as values to my deployment or workpool, get applied to the job, and then get scheduled to the appropriate nodepool based on those
specifically, you technically only need a nodeSelector or affinity policy to schedule to a gpu node, BUT, those do not prevent other work from scheduling onto that node, and can thus prevent it from scaling back to zero