Bart Shappee
10/12/2025, 10:10 PMMarvin
10/12/2025, 10:10 PMMarvin
10/12/2025, 10:24 PMprefect work-pool inspect "<your-pool>"
- Work pool concurrency:
- Confirm the pool has the expected limit and open slots:
prefect work-pool inspect "<your-pool>"
# Adjust if needed
prefect work-pool set-concurrency-limit "<your-pool>" 300
- Docs: Work pools
- Global concurrency limits (can block all work across pools):
prefect global-concurrency-limit ls
prefect global-concurrency-limit inspect "<name>"
- Docs: Global concurrency limits
- Tag-based task/flow concurrency limits (stuck slots):
prefect concurrency-limit ls
prefect concurrency-limit inspect "<tag>"
# if you find orphaned slots from crashed runs
prefect concurrency-limit reset "<tag>"
- Are runs actually PENDING, or are there many RUNNING/SUBMITTED elsewhere?
prefect flow-run ls --state RUNNING --limit 1000
prefect flow-run ls --state SUBMITTED --limit 1000
If 300 runs are RUNNING/SUBMITTED, your limit is being used by those.
Clearing a backlog of stale PENDING runs
- Cancel them (safe and reversible state change):
- UI: filter by state = PENDING and cancel in bulk (if applicable in your UI).
- CLI (one-off IDs):
prefect flow-run cancel <flow-run-id>
- Docs: Cancel workflows
- If you need to clean up in bulk from the CLI, you can list and then cancel by ID. Since the CLI does not (yet) provide a native bulk-cancel, most teams either use the UI or a short script with the API client. If you’d like, I can draft a small Python script to cancel all PENDING runs older than X days.
Why you may see “no jobs starting”
- Workers are offline: the scheduler will queue runs as PENDING but nothing can start until a worker polls.
- Concurrency limits are fully consumed: work pool limit, a deployment-level limit (if set), a global concurrency limit, or a tag-based limit could be at capacity.
- Stuck leases: orphaned slots from previously crashed runs can keep concurrency “full” until reset.
- Work pool paused: verify the pool is not paused.
Next steps / info I need to help pinpoint this
- Which concurrency “300 job limit” are you referring to: work pool concurrency, global concurrency, or a tag/deployment limit?
- What worker type(s) are you running (Kubernetes, Docker, Process), and are they showing as online with recent heartbeats?
- Do you have any global or tag-based concurrency limits configured?
- Are the 400+ runs truly PENDING, or are there many RUNNING/SUBMITTED runs as well?
If you share the work pool name(s) and which limit you’re looking at, I can give you exact commands to inspect and, if needed, a safe script to cancel just the stale PENDING runs.Bart Shappee
10/13/2025, 2:21 AMMarvin
10/13/2025, 2:21 AMMarvin
10/13/2025, 2:23 AMprefect work-pool inspect "<pool>"
prefect work-pool preview "<pool>" --hours 24
Docs: Work pools
- See what’s actually running vs pending:
prefect flow-run ls --state RUNNING --state SUBMITTED --limit 1000
prefect flow-run ls --state PENDING --limit 1000
- Check for global or tag-based concurrency limits that might be full or stuck:
prefect global-concurrency-limit ls
prefect concurrency-limit ls
# Inspect a tag and reset if you find orphaned slots
prefect concurrency-limit inspect "<tag>"
prefect concurrency-limit reset "<tag>"
Docs: Global concurrency limits
- Worker health:
- In the UI: Work Pool > Workers; verify recent heartbeats.
- Worker logs: look for repeated attempts to pick the same runs or infra errors that prevent SUBMITTED/RUNNING.
Preventing a recurrence
- Keep queues/pools segmented so one problematic deployment/tag doesn’t block everything else.
- Add an automation or periodic script to cancel PENDING runs older than N hours if they’re known to be stale.
- Docs: Cancel workflows
- If you’re using tag/global limits, monitor and occasionally reset stuck leases if you see crashes.
If you can share:
- Which “300 job limit” this is (work pool limit, global concurrency limit, or something else)
- Your worker type(s) and whether they showed recent heartbeats
- Whether you use tag/global concurrency limits
I can help pinpoint which of the above caused the blockage and provide a small script to safely cancel stale PENDING runs older than X days.