Akram Gharbi
10/13/2025, 2:14 PMprefect-process-worker-rs
• 1 replica initially, autoscaled by an HPA based on CPU usage
• Each worker runs with --limit 3 to restrict concurrency
• Connected to a single work pool (process-worker) and a single work queue (default)
The issue I’m facing is with horizontal scaling and run distribution:
• When I launch, say, 100 flow runs, the first worker pod picks up 3 runs (as expected) and starts working.
• CPU rises → the HPA spawns new worker pods (worker-2, worker-3, etc.).
• However, these new workers stay idle and don’t pick up any of the remaining queued runs.
• The runs sitting in the work queue remain assigned to the first worker or stay pending, even though the new pods are healthy and polling.
• Only when I trigger new flow runs do those new pods start taking work.
So it seems Prefect doesn’t immediately recognize new worker pods joining the pool — or maybe already-queued runs aren’t redistributed when new workers appear. Is there a recommended way to make new workers pick up queued runs faster ?Marvin
10/13/2025, 2:15 PMMarvin
10/13/2025, 2:22 PMPREFECT_WORKER_PREFETCH_SECONDS=0 on your worker pods so they only claim runs that are ready to start immediately, instead of reserving upcoming runs. This prevents the first worker from “owning” a backlog that newly spawned pods can’t see.
2) Increase polling frequency (with care)
- Lower PREFECT_WORKER_QUERY_SECONDS (e.g., 2–5 seconds) so new pods notice available work quickly. This increases API traffic, so find a balance that works for your cluster.
3) Pre-warm some replicas
- HPA reacts after CPU rises on the first pod. If you typically burst to high concurrency, consider a minimum replica count > 1 so work is distributed immediately across multiple pods rather than waiting for scale-out.
4) Double-check pool/queue limits
- Ensure there isn’t a global work pool or tag-based concurrency limit that would cap total concurrent runs at 3; that would also keep new pods idle even if they’re healthy.
Kubernetes example (env on your worker Deployment)
env:
- name: PREFECT_API_URL
value: http://<your-prefect-api>
- name: PREFECT_API_KEY
valueFrom:
secretKeyRef:
name: prefect-api
key: key
- name: PREFECT_WORKER_PREFETCH_SECONDS
value: "0"
- name: PREFECT_WORKER_QUERY_SECONDS
value: "3"
Why this helps
- With prefetch disabled (or shortened), the first worker will not lease a large backlog; as soon as new pods come online, they’ll see those unclaimed runs and start them.
- Faster polling reduces the time-to-pickup for newly available runs.
Docs for reference
- Workers and how they poll/claim runs: Workers (Prefect 3.x)
- Work queues API shape (useful to understand deliveries/leases): Work queue API
A couple of clarifying questions to make sure there aren’t other blockers:
- Do you have any work pool/queue-level or tag-based concurrency limits configured?
- Roughly how long do the individual flow runs take, and are they scheduled “for now” or slightly in the future?
If you’d like, I can suggest HPA/KEDA patterns to scale on queue backlog (instead of CPU) so scale-out happens before the first worker gets saturated.