<@ULVA73B9P> are prefect workers and the jobs they...
# ask-marvin
k
@Marvin are prefect workers and the jobs they submit on kubernetes pods related for compute purposes? If a worker has X CPU allocated, will each flow scheduled as a k8s job use a portion of that X CPU, or are they distinct
m
In Prefect with Kubernetes, workers and the jobs they submit are related but have distinct compute resource allocations. When you configure a Kubernetes work pool in Prefect, you specify resource requests for each job, such as CPU requests, which determine the resources that each Kubernetes job will request when it is launched. When you set a CPU request for a pod in the job configuration, you are specifying the amount of CPU that this individual Kubernetes job will request. This means each flow run submitted as a Kubernetes job can have its own resource request that is independent of the worker's resource allocation. The workers, in this context, act as the entity that submits jobs to the Kubernetes cluster, and they don't share their resources with the jobs they submit. Here's the relevant information from the documentation regarding setting CPU requests in the Kubernetes work pool configuration: - Kubernetes Deployment Guide Remember that the actual allocation of resources to pods in Kubernetes depends on the cluster's available resources and its scheduler's decisions. It is also worth noting that for optimal resource usage and isolation, each job in Kubernetes can have its CPU and memory requests and limits defined, so they don't necessarily share the worker's CPU allocation.
k
they're distinct
they might share resources from the same node if the job gets scheduled on the node that the worker is running on though
k
thought so but after moving all our flows to workers I'm suddenly getting all kinds of compute issues
Copy code
prefect.flow_runs.worker - Pod event 'FailedScheduling' at 2024-05-20 21:50:40+00:00: 0/14 nodes are available: 1 node(s) had untolerated taint {ToBeDeletedByClusterAutoscaler: 1716241751}, 4 node(s) had untolerated taint {<http://node.kubernetes.io/not-ready|node.kubernetes.io/not-ready>: }, 5 Insufficient memory, 9 Insufficient cpu. preemption: 0/14 nodes are available: 5 Preemption is not helpful for scheduling, 9 No preemption victims found for incoming pod..
if the job gets scheduled on the node that the worker is running on
oh this is highly likely the case
k
Copy code
0/14 nodes are available
it seems like most resources in the cluster are in use though
k
Yeah that makes sense ^ I didn't change the K8s job compute size, and we matched the Worker replica count to former agent replica count 🤔
k
are those long running pods? what else is happening in that cluster?
k
Strictly prefect infra. old agents (6 replicas) new workers (6 replicas) all flows > k8s job/pods
so the workers and agents are long running. Odd that we're getting the compute issues too since we're not at our max node allotment (max 25), we're wavering between 9 and 14
Who would have thought... we needed to up our machine size 😉
k
ah! so so glad you got it figured out!
KG making it work