<@ULVA73B9P> I am triggering run at scale of a dep...
# marvin-ai
j
@Marvin I am triggering run at scale of a deployment scheduled by worker deployed on kubernetes. Currently there is more time taken in scheduling and pending state than running. How can I configure the work pool to speed up scheduling and execution
m
thought for 435.2 seconds
Great question — when you’re triggering many runs at once, the time before “Running” is usually dominated by two things: 1) Prefect scheduling latency (worker polling, concurrency limits) 2) Kubernetes pod startup/scheduling (image pulls, resource pressure) Below are the levers you can tune in Prefect 3.x to reduce both. Prefect-side tuning (work pool, queues, workers) - Increase work pool and/or work queue concurrency limits - If your pool/queue has a low limit (or a limit you outgrow during bursts), runs will sit in Scheduled/Pending waiting for a slot. - Commands:
Copy code
prefect work-pool inspect "my-pool"
    prefect work-pool set-concurrency-limit "my-pool" 50
    prefect work-queue set-concurrency-limit "my-queue" 50
- Docs: Work pool CLI reference - Run more worker replicas - Workers are horizontally scalable. If you run your Kubernetes worker as a Deployment, scale it:
Copy code
kubectl scale deployment/my-prefect-worker --replicas=5
- Multiple workers will fetch and submit runs in parallel, speeding up transition from Scheduled → Pending. - Reduce worker polling interval and increase prefetch - Lower the poll interval so workers pick up runs faster; increase prefetch so near-future runs are claimed early. - Set these env vars on your worker container:
Copy code
- name: PREFECT_WORKER_QUERY_SECONDS
      value: "2"        # default is 10
    - name: PREFECT_WORKER_PREFETCH_SECONDS
      value: "60"       # default is 10
    - name: PREFECT_WORKER_HEARTBEAT_SECONDS
      value: "10"       # default is 30
- This reduces time spent in “Scheduled” waiting for a worker poll. - Keep queues focused - If a worker is polling many queues, consider targeting a specific queue or sharding queues by workload to reduce contention:
Copy code
prefect worker start -p "my-pool" -q "my-queue"
Kubernetes-side tuning (base job template) - Use a fast, prebuilt image and avoid heavy image pulls - Make sure your deployment’s base job template references an image that already contains all dependencies. - Use
imagePullPolicy: IfNotPresent
, configure
imagePullSecrets
if needed, and prefer a nearby registry. Example snippet you can bake into your pool’s base job template:
Copy code
{
  "job": {
    "spec": {
      "template": {
        "spec": {
          "containers": [
            {
              "name": "flow",
              "image": "your-registry/your-image:tag",
              "imagePullPolicy": "IfNotPresent",
              "resources": {
                "requests": {
                  "cpu": "500m",
                  "memory": "512Mi"
                },
                "limits": {
                  "cpu": "1",
                  "memory": "1Gi"
                }
              }
            }
          ],
          "imagePullSecrets": [
            {
              "name": "your-pull-secret"
            }
          ]
        }
      }
    }
  }
}
- Right-size resource requests and add scheduling hints - If Pending is long due to “Insufficient CPU/Memory,” reduce requests/limits or add node selectors/tolerations to find capacity faster. For high-priority jobs, set a
priorityClassName
. - Update your pool with a tuned base job template - Get a starting template, edit it, then update:
Copy code
prefect work-pool get-default-base-job-template --type kubernetes > base.json
    # edit base.json as above
    prefect work-pool update "my-pool" --base-job-template base.json
- Guide: Kubernetes deployment infra guide Operational checks - Inspect your work pool/queue for limits and recent activity:
Copy code
prefect work-pool inspect "my-pool"
  prefect work-queue inspect "my-queue"
- For pods stuck in Pending, check events to see the cause:
Copy code
kubectl describe pod <pod-name>
Typical culprits: image pull backoffs, insufficient resources, missing imagePullSecrets. A reasonable starting configuration
- Work pool/queue concurrency: set to match the number of pods you want running concurrently (plus a small buffer). For example, 50 for bursty workloads. - Worker replicas: 3–5 replicas for high-throughput submission. - Worker env:
Copy code
PREFECT_WORKER_QUERY_SECONDS=2
  PREFECT_WORKER_PREFETCH_SECONDS=60
  PREFECT_WORKER_HEARTBEAT_SECONDS=10
- Kubernetes base job template: prebuilt image, IfNotPresent, right-sized requests, and (if needed) priorityClassName/nodeSelector/tolerations. References - Work pool CLI reference - Kubernetes deployment infra guide If you can share: - Your work pool type and base job template snippet - Current work/queue concurrency limits - Worker replica count and worker env vars - A sample kubectl describe from a slow-to-start pod …I can give more targeted recommendations (e.g., exact resource requests/limits and template edits).