Kiran
09/24/2025, 5:54 AMMarvin
09/24/2025, 5:54 AMMarvin
09/24/2025, 6:00 AMprefect work-pool inspect "<your-pool-name>"
Look for:
- paused = False
- concurrency_limit (see if it’s set)
- queues configured for that pool
- If your deployment is set to a specific work_queue_name, ensure workers start with --work-queue "<that-queue>" or are polling it.
2) Check for global concurrency limits by tag
- If your deployment/flows use tags, a tag-based limit may be throttling runs in a Pending “AwaitingConcurrencySlot” state.
- CLI:
prefect concurrency-limit ls
If you see limits that match your tags, either increase them:
prefect concurrency-limit create <tag> <new_limit>
or inspect/reset if slots are stuck:
prefect concurrency-limit inspect <tag>
prefect concurrency-limit reset <tag>
- Docs: Global concurrency limits
3) Check the work pool’s concurrency limit
- If a limit is set on the pool, only that many runs will be allowed concurrently, regardless of worker count.
- Increase or clear it if needed:
prefect work-pool set-concurrency-limit "<your-pool-name>" 200
or
prefect work-pool clear-concurrency-limit "<your-pool-name>"
- Docs: Work pools
4) Look at a “Pending” run’s state details
- In the UI, click a Pending run and open “State details”. If you see:
- “AwaitingConcurrencySlot …” → tag/global concurrency limit
- “Awaiting work pool concurrency” → work pool concurrency limit
- No worker found for queue/pool → pool/queue mismatch
5) Infra capacity/quotas
- If you’re using Docker/Kubernetes/ECS, workers may be up but your cluster/cloud may only admit a subset of jobs at a time. That typically shows up as some runs quickly starting while others sit Pending until capacity frees up.
- Check your infra for resource quotas or pending jobs.
6) Optional: subflow vs independent runs
- If you’re calling prefect.deployments.run_deployment, the default is as_subflow=True. That’s fine, but you can set as_subflow=False to decouple from the parent flow’s state tracking if you prefer. Signature for reference:
from prefect.deployments import run_deployment
run_deployment(
name="path/to/deployment",
parameters={...},
as_subflow=True, # set to False if you want
work_queue_name="...", # ensure this matches your workers if you set it
tags=[...]
)
- API ref: run_deployment
If you can share a bit more, I can pinpoint it quickly:
- The work pool name and whether it shows a concurrency limit
- Whether your sub-deployment sets a work_queue_name
- What the “State details” say for one of the Pending runs
- Whether you’re using tags on the deployment/flows
- Which infrastructure type your workers use (Process, Docker, Kubernetes, ECS, etc.)
Links for reference:
- Work pools: docs
- Global concurrency limits: docsKiran
10/15/2025, 6:32 AMMarvin
10/15/2025, 6:32 AMMarvin
10/15/2025, 6:38 AMrun_deployment, the flow run will use the deployment’s configured work queue; if the deployment doesn’t have one set, it goes to the work pool’s Default queue. Creating additional queues won’t change routing unless you point deployments (or each run_deployment call) at those queues.
Details and suggestions:
- In Prefect 3.x, run_deployment has an optional work_queue_name argument. If you omit it, the server uses the queue configured on the deployment; if none, it uses the pool’s Default queue. So all your runs will continue to land on “Default” unless you change that.
- Starting more workers will reduce pickup latency:
- Start more workers against the same pool/queue to increase capacity.
- If you start a worker with only --pool (no --work-queue), it will poll all queues in that pool. If you want strict capacity per queue, start each worker with --work-queue <name>.
- To actually use multiple queues, either:
- Set the deployment to target a specific queue, or
- Pass work_queue_name when you call run_deployment.
Examples:
- Create queues and start workers
prefect work-queue create high-priority -p my-pool
prefect work-queue create low-priority -p my-pool
# Worker pinned to a specific queue
prefect worker start -p my-pool --work-queue high-priority
# Worker that polls all queues in the pool
prefect worker start -p my-pool
- Route a run to a queue at call time
from prefect.deployments.flow_runs import run_deployment
await run_deployment(
name="my-flow/my-deployment",
parameters={"x": 1},
work_queue_name="high-priority"
)
Helpful docs:
- Work pools and queues
- Workers
- CLI: work-queue
If you share how you want to partition runs (e.g., by priority, team, or environment), I can suggest a concrete queue/deployment layout.Kiran
10/15/2025, 9:40 AMMarvin
10/15/2025, 9:40 AMMarvin
10/15/2025, 9:41 AMwork_queue_name in run_deployment, runs go to the deployment’s configured queue; if none, they land on the pool’s Default queue.
- A worker started with just --pool will poll all queues in that pool; a worker started with --work-queue <name> will poll only that queue. Either way, more workers = faster pickup.
- Multiple queues help for isolation, different job templates, prioritization, or separate concurrency limits. They don’t inherently speed up pickup unless you were previously bottlenecked by a queue-level concurrency limit.
What to do to speed up pickup for 748 runs:
- Start more workers for the same pool. You can point them all at the Default queue or let them poll the whole pool.
- Make sure you’re not throttled by pool or queue concurrency limits.
- Use an infrastructure-backed worker type (e.g., Kubernetes, ECS) if you need to fan out quickly.
Examples:
- Add workers against the same pool/queue
# Start 5 workers that poll the entire pool
prefect worker start -p my-pool
prefect worker start -p my-pool
prefect worker start -p my-pool
prefect worker start -p my-pool
prefect worker start -p my-pool
# Or pin them to the Default queue explicitly
prefect worker start -p my-pool --work-queue Default
- If you do want separate queues (e.g., for isolation), it’s fine to run one worker per queue:
prefect work-queue create q1 -p my-pool
prefect work-queue create q2 -p my-pool
prefect worker start -p my-pool --work-queue q1
prefect worker start -p my-pool --work-queue q2
Docs:
- Work pools and queues
- Workers
- CLI: work-queue
If you share your worker type (local, Docker, K8s, etc.), current pool/queue concurrency limits, and how you’re launching the 748 runs, I can suggest a concrete tuning plan.Kiran
10/15/2025, 10:04 AMMarvin
10/15/2025, 10:04 AMMarvin
10/15/2025, 10:08 AMprefect work-pool create my-k8s-pool --type kubernetes
prefect work-pool get-default-base-job-template --type kubernetes --file k8s-job.json
# Edit k8s-job.json (image, namespace, resources, env, serviceAccount, volumes, etc.)
prefect work-pool create my-k8s-pool --base-job-template k8s-job.json --overwrite
- Start a worker with the required extras installed:
# Kubernetes
uv run --with 'prefect[kubernetes]' prefect worker start -p my-k8s-pool
# ECS
uv run --with 'prefect[aws]' prefect worker start -p my-ecs-pool
- Point your deployment to the pool (either in your project’s prefect.yaml or in code), then trigger runs. The worker will create pods/tasks per run.
Kubernetes specifics (fast fan-out tips)
- Image startup time: Use a small image and pre-pull it on nodes if possible; set imagePullPolicy: IfNotPresent when appropriate. Private registries should be close to the cluster.
- Node scaling: Enable/verify the cluster autoscaler and ensure resource requests/limits are realistic so the scheduler can bin-pack efficiently.
- Permissions: Assign a serviceAccount with minimal RBAC rights to create Jobs/Pods in the target namespace.
- Resources: Tune CPU/memory requests to balance density vs throttling; too-large requests slow scheduling.
- Cleanup: Use TTLAfterFinished for jobs or let Prefect manage job cleanup; keep logs flowing to your logging stack.
ECS specifics (fast fan-out tips)
- Launch type: Fargate is serverless (fast to scale, no cluster management) but watch costs; EC2-backed ECS can be faster/cheaper if you manage capacity.
- Task definition: Right-size CPU/memory; specify subnets/security groups; attach an execution role and task role with least-privilege.
- Networking and images: Keep ECR images in-region; ensure NAT/bandwidth is not a bottleneck for image pulls.
- Logs: Configure CloudWatch logs for visibility and quick troubleshooting.
When you’ll see the biggest gains
- You have large bursts (like your 748 runs) and need many to start concurrently.
- Your current bottleneck is one machine’s capacity (local/process worker) rather than orchestration.
- You need isolation between runs, or different images per deployment.
What still matters for pickup speedMarvin
10/15/2025, 10:08 AMfrom prefect import flow
from prefect.deployments import Deployment
@flow
def my_flow():
...
Deployment(image="<http://ghcr.io/acme/my-image:latest|ghcr.io/acme/my-image:latest>").from_source(
"<https://github.com/acme/my-repo>",
entrypoint="flows/my_flow.py:my_flow"
).deploy(
name="prod",
work_pool_name="my-k8s-pool"
)
Docs to dive deeper
- Work pools and queues: Work pools and queues
- Workers overview: Workers
- Managing work pools and job templates: Manage Work Pools
- AWS ECS guide: Prefect AWS ECS guide
- Serverless infra patterns: Serverless infrastructure guide
If you share your target platform (K8s vs ECS), image strategy, and any concurrency limits, I can propose a concrete job template and scaling plan for your 748-run bursts.Kiran
10/15/2025, 10:18 AMMarvin
10/15/2025, 10:18 AMMarvin
10/15/2025, 10:20 AM--pool poll all queues in the pool.
- Workers started with --work-queue poll only that queue. If your runs all land on “Default” but most workers are pinned to other queues, they won’t pick anything up.
- Pool/queue state and limits:
- Pool or queue paused, or hidden concurrency limits on the pool/queue/deployment can throttle pickup.
- API rate limits/backoff:
- If workers hit Prefect Cloud API rate limits, they’ll back off and poll less often, delaying pickup.
- Queue filters/priority:
- Misconfigured queue filters (e.g., by tags) or priorities can make runs ineligible for some workers.
- Worker submission throughput vs infra startup:
- Workers claim runs fast but each submission might involve image pulls and provider API calls (K8s/ECS). If containers/pods/tasks are slow to start, it can look like “pickup” delay.
How to diagnose quickly
- Verify workers are polling the right place and are healthy
- Make sure all workers are started against the correct pool and, if pinned, the correct queue(s).
- In the UI, check Worker status and heartbeats; in logs, look for “claimed N runs” vs “no eligible runs”.
- Inspect queues and pools
prefect work-queue ls -p <your-pool> --verbose
prefect work-pool inspect <your-pool>
- Confirm: not paused, no concurrency limits, queue priorities/filters look correct, and the runs are actually in the queue your workers poll.
- Preview and read runs
prefect work-queue preview -p <your-pool>
prefect work-queue read-runs -p <your-pool>
- Preview shows which runs are eligible and why/why not.
- Check for API rate limiting in worker logs
- Look for 429 responses/backoff messages. If present, consider staggering worker starts or reducing poll aggressiveness.
- Check infra startup time
- For K8s/ECS: are images large or not cached? Is autoscaling slow? Are there node capacity issues, slow image registry pulls, or IAM/permissions hiccups?
Concrete fixes to try
- Ensure all workers can see the runs
- If you’re using multiple queues, either route runs to those queues or start workers with just --pool so they poll all queues.
- Add more workers or keep a few “submitter” workers but scale infra
- Typically a few workers suffice; fan-out comes from infra-backed jobs (K8s/ECS). If you’re already infra-backed, focus on infra startup time.
- Reduce infra cold-start
- Use smaller images, pre-pull or warm images, colocate registry with cluster/region, enable/verify K8s cluster autoscaler or ECS service quotas, and right-size CPU/memory requests.
- Re-check hidden limits
- Deployment-level concurrency limits, global concurrency limits, or queue/pool limits can be easy to miss.
- Avoid over-pinning workers to empty queues
- If in doubt, start workers like:
prefect worker start -p <your-pool> # polls all queues in the pool
If you can share:
- Your work pool type (process/docker/kubernetes/ecs),
- How the 748 runs are created (schedule vs burst run_deployment),
- Whether the delay is from Scheduled -> Pending or Pending -> Running,
- And a sample of your worker logs around the delay,
I’ll pinpoint the likely bottleneck and suggest exact changes. Docs for reference:
- Work pools and queues: Work pools and queues
- Workers: WorkersMarvin
10/15/2025, 10:20 AMKiran
10/15/2025, 12:09 PMMarvin
10/15/2025, 12:09 PMMarvin
10/15/2025, 12:24 PMmkdir -p /var/log/prefect
nohup prefect worker start -p my-pool >> /var/log/prefect/worker.log 2>&1 &
- Rotate with logrotate (recommended for long-running services).
Prefect-native: add a file handler via PREFECT_LOGGING_SETTINGS
- Prefect honors a standard Python logging dictConfig via the PREFECT_LOGGING_SETTINGS env var. Example (rotating file + console):
export PREFECT_LOGGING_SETTINGS='{
"version": 1,
"disable_existing_loggers": false,
"formatters": {
"standard": {
"format": "%(asctime)s | %(levelname)s | %(name)s - %(message)s"
}
},
"handlers": {
"console": {
"class": "logging.StreamHandler",
"formatter": "standard",
"stream": "<ext://sys.stdout>"
},
"file": {
"class": "logging.handlers.RotatingFileHandler",
"formatter": "standard",
"filename": "/var/log/prefect/worker.log",
"maxBytes": 10485760,
"backupCount": 5
}
},
"loggers": {
"prefect": { "handlers": ["console", "file"], "level": "INFO", "propagate": false },
"": { "handlers": ["console", "file"], "level": "INFO" }
}
}'
export PREFECT_LOGGING_LEVEL=INFO
prefect worker start -p my-pool
- Make sure the directory exists and is writable by the worker process.
- You can also store that JSON in a file and set the env var to its contents:
export PREFECT_LOGGING_SETTINGS="$(cat ./logging.json)"
Docker
- If the worker runs in Docker, either:
- Use Docker logs (recommended) and a logging driver (json-file, fluentd, etc.), or
- Bind mount a host directory and point your FileHandler to it:
docker run --name prefect-worker \
-e PREFECT_API_URL=... -e PREFECT_API_KEY=... \
-e PREFECT_LOGGING_SETTINGS="$(cat logging.json)" \
-v /var/log/prefect:/var/log/prefect \
prefecthq/prefect:3-latest prefect worker start -p my-pool
Kubernetes
- Best practice: write to stdout and let your cluster logging (e.g., EFK/CloudWatch) collect logs. If you must write to a file, mount a volume and point your FileHandler there:
- In your work pool’s base job template:
- Add a volume/volumeMount (emptyDir or PVC) at /logs
- Set PREFECT_LOGGING_SETTINGS env var to use "filename": "/logs/worker.log"
- Example env var snippet in the job template:
"env": [
{ "name": "PREFECT_LOGGING_SETTINGS",
"value": "{\"version\":1,\"handlers\":{\"file\":{\"class\":\"logging.handlers.RotatingFileHandler\",\"filename\":\"/logs/worker.log\",\"maxBytes\":10485760,\"backupCount\":5},\"console\":{\"class\":\"logging.StreamHandler\"}},\"loggers\":{\"prefect\":{\"handlers\":[\"console\",\"file\"],\"level\":\"INFO\",\"propagate\":false}}}"
}
]
ECS
- Recommended: use the task’s awslogs/FireLens config to ship stdout/stderr to CloudWatch or another sink.
- If you need a file, mount an EFS volume (or ephemeral storage) at /logs and configure PREFECT_LOGGING_SETTINGS to write to /logs/worker.log.
Notes
- Worker logs are about the worker process (polling, claiming, submission). Flow run logs come from the run’s pod/task/container and show in the Prefect UI; they’re typically handled by your platform’s logging.
- You can raise/lower verbosity via PREFECT_LOGGING_LEVEL (e.g., DEBUG/INFO/WARN).
If you tell me where your worker is running (local, Docker, K8s, ECS) I can give you a drop-in config snippet for that environment.