Hello good people! Since this morning we have an issue on our production. We have a setup with Kuber...
m
Hello good people! Since this morning we have an issue on our production. We have a setup with Kubernetes block and with a work pool which has 1 work queue where we have 7 slots available. Whenever we start more than 7 deployments at once, some of the deployments get status =
Late
which is normall behaviour, whereas some of them go to
pending
and after a while they get status
crashed
. Please, if anybody have any idea where to look I will strongly appreciate it. All was running good untill today. We are spinning our own containers with prefect version
2.19.2
. Here I paste agent VM - k8s logs:
Copy code
These are events in pending pod
Events:
  Type     Reason            Age    From               Message
  ----     ------            ----   ----               -------
  Warning  FailedScheduling  2m49s  default-scheduler  0/1 nodes are available: 1 Insufficient cpu, 1 Insufficient memory. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod.
  Warning  FailedScheduling  2s     default-scheduler  0/1 nodes are available: 1 Insufficient cpu, 1 Insufficient memory. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod.
[11:34 AM] Amir Damirov
These are logs of pending pod
kubectl logs accelerated-vulture-s24rm-z6fb2
/usr/local/lib/python3.10/runpy.py:126: RuntimeWarning: 'prefect.engine' found in sys.modules after import of package 'prefect', but prior to execution of 'prefect.engine'; this may result in unpredictable behaviour
  warn(RuntimeWarning(msg))
09:35:13.030 | INFO    | Flow run 'accelerated-vulture' - Downloading flow code from storage at None
09:35:19.541 | INFO    | prefect.engine - Engine execution of flow run '9a031110-81c3-4789-9757-25fa3d7bc140' aborted by orchestrator: Unable to take work pool or work queue concurrency slot for flow run