Tristan Moeller
07/30/2025, 6:58 AMMarvin
07/30/2025, 6:58 AMMarvin
07/30/2025, 6:58 AMkubectl get events
) around the time when this started happening?
Also, I recommend checking:
1. The Prefect Cloud UI for any work pool configuration changes
2. Kubernetes pod events and logs for the affected deployments
3. Resource utilization metrics around the time this started happening
This behavior starting suddenly at a specific time (yesterday at 8 AM) suggests there might have been a change or event that triggered this. Looking at your infrastructure logs and events around that specific time would be particularly valuable.
Would you be able to provide any of this additional information to help narrow down the cause?Tristan Moeller
07/30/2025, 8:16 AM"prefect.resource.id": "prefect.flow-run.06889d3d-4863-79bd-8000-d19ee1fe2a40",
"prefect.resource.name": "imposing-centipede",
"prefect.run-count": "0",
"prefect.state-message": "Deployment concurrency limit reached.",
"prefect.state-name": "Cancelled",
"prefect.state-timestamp": "2025-07-30T08:12:09.872441+00:00",
"prefect.state-type": "CANCELLED"
But there are not flows running for this deployment, but they are still cancelled?Marvin
07/30/2025, 8:16 AMMarvin
07/30/2025, 8:18 AMbash
prefect deployment inspect "your-deployment-name"
2. Immediate Solutions:
- Try pausing and then resuming the deployment to reset its state
- If possible, temporarily increase the concurrency limit to see if it resolves the issue
- Check if there are any "zombie" flow runs stuck in a transitional state
3. Root Cause Investigation:
- Look for any flow runs that might have terminated abnormally (like pod OOMKills or sudden terminations) around yesterday 8 AM
- Check if there were any deployment updates or changes around that time
- Verify if there are any flow runs marked as "Running" in the UI that aren't actually running
Could you:
1. Share what concurrency limit is set for this deployment (if any)?
2. Check if there are any flow runs stuck in "Running" state in the UI?
3. Confirm if this is happening to all deployments or just specific ones?
This issue typically requires some cleanup of the concurrency tracking state. If you need to reset it completely, you might need to:
1. Pause the deployment
2. Cancel any pending runs
3. Wait a few minutes
4. Resume the deployment
Would you like me to provide more specific steps for any of these approaches?