Amir Kamal
07/16/2025, 6:29 AMPod event 'FailedScheduling' (5 times) at 2025-07-16 06:26:33+00:00: 0/5 nodes are available: 5 Insufficient <http://nvidia.com/gpu|nvidia.com/gpu>. preemption: 0/5 nodes are available: 5 No preemption victims found for incoming pod.
Marvin
07/16/2025, 6:29 AMMarvin
07/16/2025, 6:29 AMyaml
resources:
limits:
<http://nvidia.com/gpu|nvidia.com/gpu>: 1 # Remove or adjust this if GPU isn't needed
requests:
<http://nvidia.com/gpu|nvidia.com/gpu>: 1 # Remove or adjust this if GPU isn't needed
2. If you do need GPUs:
- Verify your nodes have physical NVIDIA GPUs installed
- Check if the NVIDIA device plugin is installed in your cluster:
bash
kubectl get pods -n kube-system | grep nvidia-device-plugin
- Verify GPU availability in your cluster:
bash
kubectl describe nodes | grep <http://nvidia.com/gpu|nvidia.com/gpu>
- Check if other pods are using the GPUs:
bash
kubectl get pods -A -o=custom-columns=NAME:.metadata.name,GPU:.spec.containers[*].resources.limits.nvidia\.com/gpu
3. If you need GPUs but don't have enough:
- Scale down other GPU-using workloads
- Add more GPU nodes to your cluster
- Adjust your GPU resource requests to match availability
Since this appears to be a Kubernetes infrastructure issue related to Prefect workloads, you might want to consider:
1. Using a node selector or tolerations to ensure your workload lands on GPU-enabled nodes
2. Reviewing your Prefect deployment configuration to ensure GPU requests align with your actual needs
3. Consider using a different work pool if you have multiple options with different resource configurations
Would you like me to:
1. Research specific Prefect configurations for GPU workloads?
2. Look for similar issues others have encountered with Prefect GPU deployments?
3. Help you verify your deployment configuration?
Just let me know what would be most helpful and I'll dive deeper into that aspect.