roshan Kumar
09/09/2025, 2:47 AMMarvin
09/09/2025, 2:47 AMMarvin
09/09/2025, 2:52 AMprefect work-pool inspect "<your-pool>"
- If needed, raise it:
prefect work-pool set-concurrency-limit "<your-pool>" 80
- Docs: Work pools
- Deployment-level concurrency_limit
- A deployment can cap its own concurrent runs; workers won’t exceed this even if they have capacity.
- Inspect a deployment:
prefect deployment inspect "flow-name/deployment-name"
- Look for concurrency_limit
in the output.
- Global/tag-based concurrency limits
- Tag-based/global limits can throttle both flow and task runs if tags on your runs are bound to a limit.
- List limits:
prefect concurrency-limit ls
- Inspect a specific tag:
prefect concurrency-limit inspect "<tag>"
- Docs:
- Global concurrency limits
- Tag-based concurrency limits
- Worker polling interval and heartbeat
- If workers poll infrequently, pickup will be slow and Late can accumulate.
- Check these env vars on your worker processes:
- PREFECT_WORKER_QUERY_SECONDS
(how often workers poll; default ~15s)
- PREFECT_WORKER_HEARTBEAT_SECONDS
(default ~30s)
- Docs: Worker health checks
- Work queues and pausing
- Ensure workers are pointing at the right pool/queues and none are paused.
- If you’re using --work-queue
, confirm those queues actually have the Late runs.
- Worker capacity vs machine capacity
- Process workers execute flows locally; even if concurrency is 10, the host may be CPU/RAM constrained. Check worker logs for failures to start or repeated retries; also monitor system resource usage.
What I’d look at next (quick data to share)
1) Output of:
prefect work-pool inspect "<your-pool>"
prefect concurrency-limit ls
2) One example deployment’s details:
prefect deployment inspect "flow-name/deployment-name"
3) How you’re starting workers (the exact command and any env vars like PREFECT_WORKER_QUERY_SECONDS
).
4) A few lines of worker logs around polling/claiming. If possible, increase verbosity:
PREFECT_LOGGING_LEVEL=DEBUG prefect worker start -p "<your-pool>"
Common resolutions
- Pool limit was low — raise it to match desired aggregate capacity.
- Deployment concurrency_limit
was set (e.g., 2) — increase or remove it.
- A tag (e.g., “prod”) had a global concurrency limit — increase/reset:
prefect concurrency-limit inspect "prod"
prefect concurrency-limit reset "prod"
- Workers were polling too slowly — reduce PREFECT_WORKER_QUERY_SECONDS
(e.g., to 5–10s) to speed pickup.
- Workers pointed at the wrong queue(s) or queues were paused — fix targeting or resume.
If you can share the inspect outputs and your worker start command, I can pinpoint the constraint quickly.roshan Kumar
09/09/2025, 3:03 AMMarvin
09/09/2025, 3:03 AMMarvin
09/09/2025, 3:07 AMprefect worker start -p "<your-pool>" --limit 10
- If you don’t pass --limit
, a process worker won’t run many flows concurrently. Can you share the exact command you use to start the workers?
2) Work-queue concurrency or queue targeting
- Even if the pool is 80, a specific work queue can have its own concurrency limit, or you might be pointing workers at queues with very few runs.
- Check queues and their limits:
prefect work-queue ls
prefect work-queue inspect <queue-id>
- If you’re using --work-queue
on workers, confirm these are the queues where your Late runs actually are. Otherwise omit --work-queue
so workers poll all queues in the pool.
3) Worker polling/health vs claim rate
- Default polling is fine for most, but when you’re trying to drain a large Late backlog, reducing the interval can help with pickup speed:
PREFECT_WORKER_QUERY_SECONDS=5 prefect worker start -p "<your-pool>" --limit 10
- Docs: Worker health checks
4) Host resource limits
- Process workers spawn one OS process per flow run. If the host is CPU/RAM constrained or has low ulimit -n
, they may only practically run 1–2 concurrently. Check system metrics and worker logs for failures to start/claim more runs.
Quick diagnostics to run
- Verify per-worker limit and queue targeting in the worker start commands you use.
- Inspect queues for unexpected concurrency limits or pausing:
prefect work-queue ls
prefect work-queue inspect <queue-id>
- Turn on debug logs for one worker to see claim decisions:
PREFECT_LOGGING_LEVEL=DEBUG prefect worker start -p "<your-pool>" --limit 10
If you share:
- The exact worker start command you’re using
- Output of prefect work-queue ls
and an inspect
for the queue(s)
- A short DEBUG log snippet around polling/claiming
…I can pinpoint why only 1–2 runs are being picked up and suggest the precise fix. Links for reference:
- Work pools: Work pools
- Global/tag concurrency: Global concurrency limits and Tag-based concurrency limitsroshan Kumar
09/09/2025, 3:23 AMMarvin
09/09/2025, 3:23 AMMarvin
09/09/2025, 3:25 AM--work-queue
, they’ll only see runs in those queues. If your backlog is spread across multiple queues, most polls won’t find many eligible runs.
- Quick checks:
prefect work-queue ls
prefect work-queue preview -p "<your-pool>"
prefect work-queue inspect <queue-id>
prefect work-queue read-runs <queue-id>
- If unsure, try starting a worker without --work-queue
so it polls all queues in the pool:
prefect worker start -p "<your-pool>" --limit 10
- Per-worker limit and actual worker start command
- Confirm each worker is actually started with --limit 10
(or a pool variable that sets the same). Without it, concurrency may be much lower or inconsistent.
- Please share the exact worker start command you use.
- Polling cadence vs. other workers claiming first
- With default polling, a worker may miss windows where many runs become eligible if other workers poll first. You can reduce poll interval temporarily to see if pickup improves:
PREFECT_WORKER_QUERY_SECONDS=5 prefect worker start -p "<your-pool>" --limit 10
- Docs: Worker health checks
- Queue-level concurrency or pausing
- Pools can be wide open (80), but individual queues can still be limited or paused. Check:
prefect work-queue ls
prefect work-queue inspect <queue-id>
- If limits exist on specific queues, raise or clear them.
- Host resource pressure
- Process workers spawn one process per flow run. If the host is CPU/RAM constrained, you’ll see effective concurrency drop below your --limit
. Monitor CPU/memory and worker logs while the worker is “stuck” at 1–2.
Highly informative debug logs
Run one worker at DEBUG to see why it isn’t claiming more (e.g., “no eligible runs”, “max concurrency reached”, etc.):
PREFECT_LOGGING_LEVEL=DEBUG prefect worker start -p "<your-pool>" --limit 10
If you can share:
- Your worker start command(s)
- Output of prefect work-queue ls
and prefect work-queue preview -p "<your-pool>"
- A short DEBUG log snippet around polls and claims
…I can pinpoint whether this is queue targeting, poll timing, resource limits, or something else. Links for reference:
- Work pools and queues: Work pools
- Concurrency limits: Global concurrency limits and Tag-based concurrency limitsroshan Kumar
09/09/2025, 3:32 AMMarvin
09/09/2025, 3:32 AMMarvin
09/09/2025, 3:33 AMPREFECT_LOGGING_LEVEL=DEBUG prefect worker start -p "<your-pool>" --limit 10
You’ll see logs like “No eligible flow runs found... sleeping for X seconds” if backoff is the cause.
- Preview and read runs in the queues to verify eligibility at the time of polling:
prefect work-queue preview -p "<your-pool>"
prefect work-queue ls
prefect work-queue inspect <queue-id>
prefect work-queue read-runs <queue-id>
Practical mitigations
- Reduce the polling interval to cut the worst-case quiet period:
PREFECT_WORKER_QUERY_SECONDS=5 prefect worker start -p "<your-pool>" --limit 10
Docs: Worker health checks
- Ensure each worker has a per-worker limit set (you mentioned 10; just double-check the actual start command):
prefect worker start -p "<your-pool>" --limit 10
- Broaden queue targeting if you’re using --work-queue
and your backlog is split across queues. Consider removing --work-queue
so workers pull from all queues in the pool, or consolidate runs into fewer queues.
- Optional: increase prefetch to catch runs slightly before their scheduled time (helps with bursty schedules):
prefect worker start -p "<your-pool>" --limit 10 --prefetch-seconds 60
- Check host resources. If the machine is CPU/RAM constrained, a worker may struggle to actually start more processes even if it claims them.
If you can share:
- The exact worker start command
- Output of prefect work-queue ls
and prefect work-queue preview -p "<your-pool>"
- A short DEBUG log snippet showing a 2-minute gap
…I can confirm whether backoff is the root cause or if something else (queue limits/targeting or resource pressure) is in play.