Rasmus
07/09/2025, 11:00 AMMarvin
07/09/2025, 11:00 AMMarvin
07/09/2025, 11:02 AMbash
prefect work-pool update-queue "your-k8s-pool-name" --concurrency-limit 5
3. Additionally, ensure your Kubernetes cluster has enough resources to handle multiple pods. Check:
- Available CPU and memory in your node pool
- Any resource quotas set on your namespace
- Pod scheduling settings
4. If you're using the default work pool configuration, you might want to adjust the Kubernetes-specific settings in your work pool configuration to optimize for concurrent execution:
yaml
push_work_pool:
name: "k8s-pool"
work_queue:
name: "default"
concurrency_limit: 5
job_variables:
resources:
requests:
memory: "512Mi"
cpu: "250m"
limits:
memory: "1Gi"
cpu: "500m"
The key points to remember:
- Each flow run creates its own Kubernetes pod
- Concurrency is managed at the work queue level
- Make sure your Kubernetes cluster has enough resources to handle multiple pods
- The default queue concurrency limit might be set to 1, which would cause the behavior you're seeing
Links for further reading:
- Work Pools Documentation
- How to Configure Work Queues
Let me know if you need help with any specific configuration settings or if you run into any issues while implementing these changes.Rasmus
07/09/2025, 11:15 AMMarvin
07/09/2025, 11:15 AMMarvin
07/09/2025, 11:16 AMbash
kubectl describe pod <pending-pod-name>
Look for the "Events" section at the bottom - this will tell you exactly why Kubernetes hasn't scheduled the pod.
2. Common reasons for pending pods and their solutions:
a. Node Selector/Affinity Issues
- If your work pool configuration includes node selectors or affinity rules, ensure they're not preventing scheduling on available nodes
- Check if your nodes are properly labeled
b. Resource Quotas
- Check if your namespace has resource quotas:
bash
kubectl describe resourcequota -n your-namespace
- Verify current resource usage:
bash
kubectl describe namespace your-namespace
c. Pod Anti-Affinity
- If you have pod anti-affinity rules (either in your work pool config or cluster config), they might prevent pods from being scheduled on the same node
3. To fix this, you might need to:
a. Adjust your work pool's job template to specify appropriate resource requests:
yaml
job_configuration:
kubernetes:
pod_spec:
containers:
- name: prefect
resources:
requests:
memory: "256Mi"
cpu: "100m"
limits:
memory: "512Mi"
cpu: "200m"
b. Check if you have enough nodes available:
bash
kubectl get nodes
kubectl describe nodes
c. Verify that your nodes aren't tainted or have appropriate tolerations:
bash
kubectl get nodes -o custom-columns=NAME:.metadata.name,TAINTS:.spec.taints
4. If you're using GKE specifically:
- Check if autoscaling is enabled and working properly
- Verify that your node pools have enough capacity
- Look at the GKE console for any resource quotas or limits at the project level
The key is to first identify exactly why Kubernetes isn't scheduling the pod using kubectl describe pod
. The events section will tell you precisely what's preventing the scheduling, whether it's resource constraints, node selector issues, or other configuration problems.
Would you like me to help interpret the output of kubectl describe pod
for your pending pod? That would give us the exact reason for the scheduling delay.Marvin
07/09/2025, 11:20 AMMarvin
07/09/2025, 11:20 AM0/2 nodes are available: 1 Insufficient memory, 2 Insufficient cpu
Your cluster doesn't have enough immediate resources available on the existing nodes.
2. Autoscaling Failure:
Warning FailedScaleUp 2m27s cluster-autoscaler Node scale up in zones europe-west3-c associated with this pod failed: GCE quota exceeded
The cluster autoscaler tried to create a new node to accommodate your pod, but it hit a GCP quota limit.
To fix this, you have several options:
1. Short-term fix: Increase your GCP quotas
- Go to the GCP Console → IAM & Admin → Quotas
- Look for quotas in europe-west3-c zone, particularly:
- CPU quotas
- Instance group size quotas
- Request an increase for the relevant quotas
2. Resource Management fixes:
- Adjust your pod resource requests in your work pool configuration:
yaml
job_configuration:
kubernetes:
pod_spec:
containers:
- name: prefect-job
resources:
requests:
memory: "256Mi" # Adjust these values based on your actual needs
cpu: "200m"
limits:
memory: "512Mi"
cpu: "500m"
3. Cluster Configuration fixes:
- Increase the size of your node pool manually instead of relying on autoscaling
- Or adjust your node pool's autoscaling configuration:
bash
gcloud container clusters update your-cluster-name \
--node-pool=your-pool-name \
--enable-autoscaling \
--min-nodes=2 \
--max-nodes=4 # Adjust based on your quota and needs
4. Immediate Workaround:
- If possible, schedule your flows to run sequentially rather than concurrently until you can implement one of the above solutions
- Or terminate any unnecessary pods/workloads to free up resources
The good news is that your pod eventually did get scheduled (Successfully assigned prefect/prefect-worker-dlqcs-cvptf
), but this delay is not ideal for production workloads.
Would you like me to provide more specific guidance on any of these solutions? For example, I can help you:
1. Calculate appropriate resource requests based on your workload
2. Guide you through the GCP quota increase process
3. Help optimize your node pool configurationRasmus
07/09/2025, 11:30 AMMarvin
07/09/2025, 11:31 AMMarvin
07/09/2025, 11:32 AMCPU: minimum 0.25 vCPU (250m)
Memory: minimum 0.5 Gi
Any pods requesting less than these minimums will be automatically increased to meet them.
3. Your Helm Chart Configuration:
Yes, your Helm chart values can definitely affect this. You should check:
yaml
# In your values.yaml
worker:
resources:
requests:
cpu: "500m" # Must be at least 250m in Autopilot
memory: "512Mi" # Must be at least 0.5Gi in Autopilot
limits:
cpu: "1000m"
memory: "1Gi"
To fix this in GKE Autopilot:
1. Update your Work Pool configuration:
yaml
job_configuration:
kubernetes:
pod_spec:
containers:
- name: prefect-job
resources:
requests:
memory: "1Gi" # Safe Autopilot minimum
cpu: "500m" # Safe Autopilot minimum
limits:
memory: "2Gi" # Adjust based on your workload
cpu: "1000m" # Adjust based on your workload
2. Check your Helm chart values:
- Ensure resource requests/limits are properly set
- Make sure you're not using any features that Autopilot doesn't support (like DaemonSets)
3. Consider Workload Optimization:
- If possible, batch your workloads instead of running them concurrently
- Use work queues to control the flow of concurrent executions
- Consider using Prefect's built-in concurrency controls at the flow level
4. Monitor Resource Usage:
bash
# Check actual resource usage of your pods
kubectl top pods -n your-namespace
# Check Autopilot cluster metrics
kubectl get --raw "/apis/metrics.k8s.io/v1beta1/nodes" | jq
The key differences with Autopilot vs standard GKE are:
- You can't manage nodes directly
- Resource requirements are strictly enforced
- Scaling is fully automated but must comply with Autopilot's requirements
Would you like me to:
1. Help review your current Helm chart configuration?
2. Provide specific resource configurations optimized for your workload in Autopilot?
3. Suggest alternative scheduling strategies that might work better with Autopilot's constraints?Rasmus
07/09/2025, 11:34 AMauth-string
with the value of the auth string
existingSecret: ""
# -- the number of old ReplicaSets to retain to allow rollback
revisionHistoryLimit: 10
# -- number of worker replicas to deploy
replicaCount: 1
resources:
# -- the requested resources for the worker container
requests:
memory: 256Mi
cpu: 100m
# ephemeral-storage:
# -- the requested limits for the worker container
limits:
memory: 24Gi
cpu: 10000m
# ephemeral-storage:
# ref: https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/
livenessProbe:
enabled: false
config:
# -- The number of seconds to wait before starting the first probe.
initialDelaySeconds: 10
# -- The number of seconds to wait between consecutive probes.
periodSeconds: 10
# -- The number of seconds to wait for a probe response before considering it as failed.
timeoutSeconds: 5
# -- The number of consecutive failures allowed before considering the probe as failed.
failureThreshold: 3
# -- The minimum consecutive successes required to consider the probe successful.
successThreshold: 1
## ref: https://kubernetes.io/docs/tasks/configure-pod-container/security-context/#set-the-security-context-for-a-pod
podSecurityContext:
# -- set worker pod's security context runAsUser
runAsUser: 1001
# -- set worker pod's security context runAsNonRoot
runAsNonRoot: true
# -- set worker pod's security context fsGroup
fsGroup: 1001
# -- set worker pod's seccomp profile
seccompProfile:
type: RuntimeDefault
# -- in case of Localhost value in seccompProfile.type, set seccompProfile.localhostProfile value below
# localhostProfile: /my-path.json
# ref: https://kubernetes.io/docs/concepts/scheduling-eviction/pod-priority-preemption/#priorityclass
# -- priority class name to use for the worker pods; if the priority class is empty or doesn't exist, the worker pods are scheduled without a priority class
priorityClassName: ""
## ref: https://kubernetes.io/docs/tasks/configure-pod-container/security-context/#set-the-security-context-for-a-container
containerSecurityContext:
# -- set worker containers' security context runAsUser
runAsUser: 1001
# -- set worker containers' security context runAsNonRoot
runAsNonRoot: true
# -- set worker containers' security context readOnlyRootFilesystem
readOnlyRootFilesystem: true
# -- set worker containers' security context allowPrivilegeEscalation
allowPrivilegeEscalation: false
# -- set worker container's security context capabilities
capabilities: {}
## ref: https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/
# -- extra labels for worker pod
podLabels: {}
## ref: https://kubernetes.io/docs/concepts/overview/working-with-objects/annotations/
# -- extra annotations for worker pod
podAnnotations: {}
## ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity
# -- affinity for worker pods assignment
affinity: {}
## ref: https://kubernetes.io/docs/user-guide/node-selection/
# -- node labels for worker pods assignment
nodeSelector: {}
## ref: https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/
# -- tolerations for worker pods assignment
tolerations: []
## List of extra env vars
## e.g:
## extraEnvVars:
## - name: FOO
## value: "bar"
# -- array with extra environment variables to add to worker nodes
extraEnvVars: []
# -- name of existing ConfigMap containing extra env vars to add to worker nodes (deprecated, use extraEnvVarsCMs)
extraEnvVarsCM: ""
# -- names of existing ConfigMaps containing extra env vars to add to worker nodes
extraEnvVarsCMs: []
# -- name of existing Secret containing extra env vars to add to worker nodes (deprecated, use extraEnvVarsSecrets)
extraEnvVarsSecret: ""
# -- names of existing Secrets containing extra env vars to add to worker nodes
extraEnvVarsSecrets: []
# -- additional sidecar containers
extraContainers: []
# -- array with extra volumes for the worker pod
extraVolumes: []
# -- array with extra volumeMounts for the worker pod
extraVolumeMounts: []
# -- array with extra Arguments for the worker container to start with
extraArgs: []
## ServiceAccount configuration
serviceAccount:
# -- specifies whether a ServiceAccount should be created
create: true
# -- the name of the ServiceAccount to use. if not set and create is true, a name is generated using the common.names.fullname template
name: ""
# -- additional service account annotations (evaluated as a template)
annotations: {}Marvin
07/09/2025, 11:34 AMRasmus
07/09/2025, 11:35 AMMarvin
07/09/2025, 11:35 AMMarvin
07/09/2025, 11:35 AMyaml
worker:
resources:
requests:
memory: "1Gi" # Autopilot minimum is 0.5Gi
cpu: "500m" # Autopilot minimum is 250m
limits:
memory: "2Gi" # Adjust based on your workload
cpu: "1000m" # Adjust based on your workload
2. Autoscaling Configuration: Your current autoscaling settings look good, but in Autopilot context:
yaml
worker:
autoscaling:
enabled: true
minReplicas: 1
maxReplicas: 10
targetCPUUtilizationPercentage: 80
targetMemoryUtilizationPercentage: 80
This is fine, but remember that Autopilot will also enforce its own scaling decisions based on pod resource usage.
3. Init Container Resources: Your current init container resources are too low for Autopilot:
yaml
initContainer:
resources:
requests:
memory: "512Mi" # Increase from 256Mi to meet Autopilot minimum
cpu: "250m" # Increase from 100m to meet Autopilot minimum
limits:
memory: "1Gi"
cpu: "1000m"
4. Add Recommended Autopilot Configurations:
yaml
worker:
# Add pod disruption budget for better availability
podDisruptionBudget:
enabled: true
minAvailable: 1
# Add topology spread constraints for better distribution
topologySpreadConstraints:
- maxSkew: 1
topologyKey: <http://topology.kubernetes.io/zone|topology.kubernetes.io/zone>
whenUnsatisfiable: ScheduleAnyway
labelSelector:
matchLabels:
<http://app.kubernetes.io/name|app.kubernetes.io/name>: prefect-worker
5. Security Context: Your security context looks good, but make sure it's also applied to the main container:
yaml
worker:
containerSecurityContext:
runAsUser: 1001
runAsNonRoot: true
readOnlyRootFilesystem: true
allowPrivilegeEscalation: false
To implement these changes:
1. Update your values file with these configurations
2. Upgrade your Helm release:
bash
helm upgrade prefect-worker prefect/prefect-worker -f values.yaml -n your-namespace
3. Verify the changes:
bash
# Check pod resources
kubectl describe pod -l <http://app.kubernetes.io/name=prefect-worker|app.kubernetes.io/name=prefect-worker> -n your-namespace
# Monitor pod status
kubectl get pods -l <http://app.kubernetes.io/name=prefect-worker|app.kubernetes.io/name=prefect-worker> -n your-namespace -w
Common issues to watch for in Autopilot:
1. Pods stuck in "Pending" due to insufficient quota
2. Pods being terminated due to resource pressure
3. Scheduling delays due to resource fragmentation
Would you like me to:
1. Provide a complete values.yaml file with all these configurations?
2. Help you calculate the optimal resource values based on your specific workload?
3. Show you how to monitor resource usage in Autopilot to fine-tune these values?