< Marvin> are prefect workers and the jobs they submit on ku Prefect Community #ask-marvin

<@ULVA73B9P> are prefect workers and the jobs they...

05/20/2024, 10:00 PM

@Marvin are prefect workers and the jobs they submit on kubernetes pods related for compute purposes? If a worker has X CPU allocated, will each flow scheduled as a k8s job use a portion of that X CPU, or are they distinct

Marvin

05/20/2024, 10:00 PM

In Prefect with Kubernetes, workers and the jobs they submit are related but have distinct compute resource allocations. When you configure a Kubernetes work pool in Prefect, you specify resource requests for each job, such as CPU requests, which determine the resources that each Kubernetes job will request when it is launched. When you set a CPU request for a pod in the job configuration, you are specifying the amount of CPU that this individual Kubernetes job will request. This means each flow run submitted as a Kubernetes job can have its own resource request that is independent of the worker's resource allocation. The workers, in this context, act as the entity that submits jobs to the Kubernetes cluster, and they don't share their resources with the jobs they submit. Here's the relevant information from the documentation regarding setting CPU requests in the Kubernetes work pool configuration: - Kubernetes Deployment Guide Remember that the actual allocation of resources to pods in Kubernetes depends on the cluster's available resources and its scheduler's decisions. It is also worth noting that for optimal resource usage and isolation, each job in Kubernetes can have its CPU and memory requests and limits defined, so they don't necessarily share the worker's CPU allocation.

Kevin Grismore

05/20/2024, 10:01 PM

they're distinct

Kevin Grismore

05/20/2024, 10:01 PM

they might share resources from the same node if the job gets scheduled on the node that the worker is running on though

05/20/2024, 10:02 PM

thought so but after moving all our flows to workers I'm suddenly getting all kinds of compute issues

Copy code

prefect.flow_runs.worker - Pod event 'FailedScheduling' at 2024-05-20 21:50:40+00:00: 0/14 nodes are available: 1 node(s) had untolerated taint {ToBeDeletedByClusterAutoscaler: 1716241751}, 4 node(s) had untolerated taint {<http://node.kubernetes.io/not-ready|node.kubernetes.io/not-ready>: }, 5 Insufficient memory, 9 Insufficient cpu. preemption: 0/14 nodes are available: 5 Preemption is not helpful for scheduling, 9 No preemption victims found for incoming pod..

05/20/2024, 10:02 PM

if the job gets scheduled on the node that the worker is running on

oh this is highly likely the case

Kevin Grismore

05/20/2024, 10:03 PM

Copy code

0/14 nodes are available

it seems like most resources in the cluster are in use though

05/20/2024, 10:08 PM

Yeah that makes sense ^ I didn't change the K8s job compute size, and we matched the Worker replica count to former agent replica count 🤔

Kevin Grismore

05/20/2024, 10:11 PM

are those long running pods? what else is happening in that cluster?

05/20/2024, 10:25 PM

Strictly prefect infra. old agents (6 replicas) new workers (6 replicas) all flows > k8s job/pods

05/20/2024, 10:26 PM

so the workers and agents are long running. Odd that we're getting the compute issues too since we're not at our max node allotment (max 25), we're wavering between 9 and 14

05/22/2024, 5:56 PM

Who would have thought... we needed to up our machine size 😉

Kevin Grismore

05/22/2024, 6:02 PM

ah! so so glad you got it figured out!

Kevin Grismore

05/22/2024, 6:02 PM

KG making it work

50 Views

Open in Slack

Previous Next