<@ULVA73B9P> I have created a main flow/deployme...
# ask-community
k
@Marvin I have created a main flow/deployment , from this main deployment iam calling another deployment, creating effectively some 748 subflows of this deployment, i have provisioned around 10 workers, but i still see some runs getting stuck in pending state for longer periods, around 30 mins as well for some runs, why is it so, eventhough i have provisioned good number of workers?
m
thought for 368.2 seconds
Short answer: “Pending” means no worker has claimed the run yet. With 748 subflow runs, this usually comes down to one of these: - A work-pool/queue mismatch (workers aren’t polling the queue the sub-deployment uses) - A concurrency limit (either on the work pool or a global/tag-based concurrency limit) - Not enough available infra capacity even though workers are up (e.g., only a few jobs can be in-flight due to provider quotas) - The pool is paused or otherwise not accepting runs A few targeted checks to find the culprit: 1) Confirm the pool/queue wiring - Make sure your sub-deployment’s work pool (and work queue, if set) matches what your 10 workers are polling. - CLI:
Copy code
prefect work-pool inspect "<your-pool-name>"
Look for: - paused = False - concurrency_limit (see if it’s set) - queues configured for that pool - If your deployment is set to a specific
work_queue_name
, ensure workers start with
--work-queue "<that-queue>"
or are polling it. 2) Check for global concurrency limits by tag - If your deployment/flows use tags, a tag-based limit may be throttling runs in a Pending “AwaitingConcurrencySlot” state. - CLI:
Copy code
prefect concurrency-limit ls
If you see limits that match your tags, either increase them:
Copy code
prefect concurrency-limit create <tag> <new_limit>
or inspect/reset if slots are stuck:
Copy code
prefect concurrency-limit inspect <tag>
  prefect concurrency-limit reset <tag>
- Docs: Global concurrency limits 3) Check the work pool’s concurrency limit - If a limit is set on the pool, only that many runs will be allowed concurrently, regardless of worker count. - Increase or clear it if needed:
Copy code
prefect work-pool set-concurrency-limit "<your-pool-name>" 200
or
Copy code
prefect work-pool clear-concurrency-limit "<your-pool-name>"
- Docs: Work pools 4) Look at a “Pending” run’s state details - In the UI, click a Pending run and open “State details”. If you see: - “AwaitingConcurrencySlot …” → tag/global concurrency limit - “Awaiting work pool concurrency” → work pool concurrency limit - No worker found for queue/pool → pool/queue mismatch 5) Infra capacity/quotas - If you’re using Docker/Kubernetes/ECS, workers may be up but your cluster/cloud may only admit a subset of jobs at a time. That typically shows up as some runs quickly starting while others sit Pending until capacity frees up. - Check your infra for resource quotas or pending jobs. 6) Optional: subflow vs independent runs - If you’re calling
prefect.deployments.run_deployment
, the default is
as_subflow=True
. That’s fine, but you can set
as_subflow=False
to decouple from the parent flow’s state tracking if you prefer. Signature for reference:
Copy code
from prefect.deployments import run_deployment
  run_deployment(
      name="path/to/deployment",
      parameters={...},
      as_subflow=True,  # set to False if you want
      work_queue_name="...",  # ensure this matches your workers if you set it
      tags=[...]
  )
- API ref: run_deployment If you can share a bit more, I can pinpoint it quickly: - The work pool name and whether it shows a concurrency limit - Whether your sub-deployment sets a
work_queue_name
- What the “State details” say for one of the Pending runs - Whether you’re using tags on the deployment/flows - Which infrastructure type your workers use (Process, Docker, Kubernetes, ECS, etc.) Links for reference: - Work pools: docs - Global concurrency limits: docs