< Marvin> I m using a self hosted Prefect 3 server instance Prefect Community #ask-marvin

<@ULVA73B9P> I'm using a self-hosted Prefect 3 ser...

Vincent Chéry

08/27/2025, 3:39 PM

@Marvin I'm using a self-hosted Prefect 3 server instance to orchestrate my flow runs. Flows are deployed using the

deploy()

function and run on a VM with 3 vCPUs. Flows run in ephemeral docker containers. I have a total of 40 flows which are scheduled to run at various frequencies - from every minute for some flows to once a day for others - with a maximum of 6 flows scheduled to run at the same time. When I activate just a few flows everything works fine but when all schedules are activated, runs take a long time start and runs start to accumulate in

AwaitingConcurrencySlot

and the system gets overwhelmed. As an example, for a run is scheduled to start at 152400, the docker container is created at 152355 (so 5 seconds ahead of scheduled start time which I understand is expected due to prefetch time) but then it's very slow to actually start doing something. These are the logs from the container :

Copy code

15:24:42.028 | INFO    | prefect.flow_runs.runner - Opening process...
15:25:17.208 | INFO    | Flow run 'dashing-ocelot' -  > Running set_working_directory step...

So between 152355 and 152442, nothing happens in the container, then there is this line saying "Opening process..." and it takes another 35 seconds for the flow to start running. In the end the flow starts running more than one minute late and the system can never recover and get back on time.

Marvin

08/27/2025, 3:39 PM

✅ thought for 520.2 seconds

Marvin

08/27/2025, 3:48 PM

Thanks for the detailed timeline — that helps. A few things are happening here, and on a 3 vCPU VM they add up quickly when many minute-level runs stack up. What your timeline suggests - The container is created ~5s before schedule: that’s Prefect’s prefetch. That part is expected. - Inside the container, there’s a long gap until “Opening process…” and then another long gap until “Running set_working_directory step…”. Those two gaps typically come from: - CPU and disk contention while multiple containers simultaneously start Python and import your libraries - Docker/image overhead (pulls or cold starts) - Flow/runner initialization and imports (especially if a lot of modules import at module scope) - Server/API round trips under load (if your server/DB is resource constrained) - “AwaitingConcurrencySlot” means the run (flow or task) is waiting on a Prefect concurrency limit. That could be: - A work pool or work queue concurrency limit - A global concurrency limit - A tag-based concurrency limit (often applied to tasks, but can be used for flows too) Concretely, on a single 3 vCPU VM starting 6 dockerized Python processes at the same second will commonly lead to 30–60s import/boot delays per container, which pushes runs late and causes backlog. Quick wins to stabilize starts 1) Right-size concurrency to your VM - Cap how many runs your worker can start at once to 2–3 so you don’t oversubscribe CPU during cold starts. - Set a work pool concurrency limit:

Copy code

prefect work-pool set-concurrency-limit -p YOUR_POOL 3
prefect work-pool inspect -p YOUR_POOL

- If you use work queues, check them too:

Copy code

prefect work-queue ls
prefect work-queue inspect -q YOUR_QUEUE

- List any other concurrency limits that could cause the AwaitingConcurrencySlot state:

Copy code

prefect global-concurrency-limit ls
prefect concurrency-limit ls

2) Keep the image warm and lean - Pre-pull your flow image onto the VM and reuse it across flows. Avoid “always pull”: - In your Docker work pool’s job variables, set

image_pull_policy

to “if_not_present” and ensure the image is present on the host. - Build one shared image with all dependencies already installed and your flow code baked in, so each run doesn’t need to fetch/resolve anything. - Turn on

stream_output

in the work pool to see earlier logs from the child process. 3) Smooth your schedules so runs don’t stack at :00 - Offset high-frequency schedules by a few seconds and stagger minute-level runs to reduce simultaneous cold starts. 4) Use process-based runs for “tiny” minute flows - For very short, frequent flows, consider a process work pool on that VM. Process runs avoid Docker cold start overhead and will start faster. Use Docker only for the heavier, less-frequent flows. 5) Reduce Python import time - Move heavy imports (e.g., pandas, big SDKs) inside tasks/functions rather than top-level module imports. - Ensure you’re not doing heavy network I/O in module import paths. - If you deploy from source (e.g., Git), confirm the source fetch isn’t happening at runtime for each run. 6) Ensure the server/DB is not the bottleneck - Check Prefect Server and Postgres logs when many runs fire at once. If Postgres is under-provisioned (CPU/IO/connection pool), the runner’s state transitions can be slow and add seconds per step. - Make sure your server has adequate CPU/memory and Postgres has a sensible connection pool and disk IOPS. How to diagnose exactly where the time goes - Turn on debug logging for both worker and flow containers:

Copy code

export PREFECT_LOGGING_LEVEL=DEBUG

- Profile Python import time inside the container (helps confirm if imports are the big cost): - Add this env var in the Docker job variables:

Copy code

PYTHONPROFILEIMPORTTIME=1

- Or run the flow command with

-X importtime

if you control the entrypoint. - Watch Docker and host metrics during a spike: -

docker stats

htop

, and

iostat

will show CPU/disk contention during container cold starts.

Marvin

08/27/2025, 3:48 PM

- Confirm there aren’t unexpected concurrency caps:

Copy code

prefect concurrency-limit ls
prefect global-concurrency-limit ls
prefect work-pool inspect -p YOUR_POOL
prefect work-queue inspect -q YOUR_QUEUE

A minimal “stabilize first” plan - Set the work pool concurrency limit to 2–3. - Stagger your minute schedules by a few seconds each. - Pre-pull and reuse a single lean image with dependencies baked in and

image_pull_policy=if_not_present

. - Turn on

stream_output

and DEBUG logging to confirm when flow loading begins and ends. - Consider moving the lightest, most frequent flows to a process work pool on the VM. A few questions to help tailor next steps - How many workers are polling the pool, and is it just one VM? - Do you have any work pool/queue/global or tag concurrency limits configured today? - What’s the size of your Docker image and pull policy? - Are your flows deployed from source (e.g., Git) or from an image that already contains the flow code? - Can you share a short DEBUG-level worker log snippet around the moment multiple runs are scheduled at once? If you share your work pool name and a run ID that shows this delay, I can suggest more targeted changes (including exact job variable toggles for your Docker work pool).

Vincent Chéry

08/27/2025, 3:56 PM

@Marvin I have a single worker polling the API. Also note this system was migrated from Prefect 1, and the exact same 40 flows run perfectly fine in Prefect 1 on the same machine and with the same image (except the Prefect version which is 3 instead of 1). I already use a common docker image which is already present on the host and has the flows baked in. I have a concurrency limit of 1 on each deployment, but no global concurrency limit. The image is 1.65 Gb (was 1.55 Gb with Prefect 1)

Marvin

08/27/2025, 3:56 PM

✅ thought for 103.0 seconds

Marvin

08/27/2025, 3:58 PM

Thanks — that context helps a lot. Given: - single worker on a 3 vCPU VM - dockerized ephemeral runs, shared image already present - “concurrency limit = 1” per deployment, no global limits - Prefect 1 ran fine on same box/image What’s different in Prefect 3 - The flow runner starts an additional subprocess by default. Your two gaps map to: 1) container + Python + imports starting until Prefect emits “Opening process…” 2) the runner spawning the child process + that child’s Python/imports until “Running set_working_directory step…” - With many minute-level runs starting at once, those two cold-start phases contend for CPU/IO and can easily add 30–60s each on a 3 vCPU box. That lateness then causes new runs to pile up and, since each deployment is limited to 1, subsequent runs for the same deployment wait in AwaitingConcurrencySlot until the prior run finishes. High‑impact changes to try now 1) Cap parallel starts at the work pool (prevents CPU thrash during cold starts) - This ensures the worker won’t launch more containers than the VM can comfortably start at once.

Copy code

prefect work-pool set-concurrency-limit -p YOUR_POOL 2
prefect work-pool inspect -p YOUR_POOL

Adjust 2→3 after you measure. 2) Double‑check you don’t have “shared” tags behind your per‑deployment limit - If any concurrency limit tag is shared across deployments, you’ll see widespread AwaitingConcurrencySlot. List all limits and find the tag each delayed run is waiting on:

Copy code

prefect concurrency-limit ls
prefect global-concurrency-limit ls
prefect flow-run inspect --id <RUN_ID>

If you intend “per-deployment = 1”, ensure each deployment has its own unique tag for that limit. 3) Tighten the Docker job template to minimize startup overhead - In your Docker work pool job variables: -

image_pull_policy: "if_not_present"

(so Docker doesn’t pull on each run) -

stream_output: true

(so you can see logs as soon as the child starts) - If your API is on the host, consider

network_mode: "host"

to remove bridge/DNS overhead. 4) Stagger minute-level schedules - Offset them by a few seconds so you’re not launching 5–6 containers simultaneously at :00. 5) Use a process work pool for the tiniest, very frequent flows - Process runs skip Docker’s cold-start entirely and start much faster. Keep Docker for the heavier flows; move 1–2 of the minute-level flows to a process pool and compare start latency. 6) Reduce import time - Keep heavy imports (pandas, big SDKs) inside tasks/functions instead of module scope. - This can cut tens of seconds during the “Opening process…” → “set_working_directory” gap. How to confirm where the time’s going - Turn on DEBUG logs for both the worker and the flow containers:

Copy code

export PREFECT_LOGGING_LEVEL=DEBUG

- Profile import time in the container for a test run: - Add this env var in the work pool job variables for a run:

PYTHONPROFILEIMPORTTIME=1

- You’ll get a per-module breakdown of import cost in the logs. - Inspect the work pool/queue/limit state:

Copy code

prefect work-queue ls
prefect work-queue inspect -q YOUR_QUEUE
prefect concurrency-limit ls
prefect global-concurrency-limit ls

Why Prefect 1 vs 3 can feel different here - Prefect 3’s extra subprocess isolation and richer runtime services add some overhead per run. On plentiful CPU this is negligible; on a 3 vCPU host starting many containers at once, it becomes visible. Constraining concurrent starts and trimming import overhead typically brings startup back into line. A couple of quick checks I’d love from you - The exact tag you use to enforce “concurrency limit 1 per deployment” (to ensure they aren’t shared) - Your Docker work pool job variables (at least

image_pull_policy

stream_output

network_mode

, any

container_create_kwargs

) - One delayed flow run ID so we can

prefect flow-run inspect

and see which concurrency tag it was waiting on If you want, paste your pool/job template here and I’ll annotate the exact fields to change.

Vincent Chéry

08/27/2025, 4:11 PM

@Marvin I enfore the concurrency limit as follows :

Copy code

deployment = flow.to_deployment(name="unique-flow-name", schedules=[Schedule(cron="* * * *")])
deployment.concurrency_limit = 1

On the work pool :

image_pull_policy

None

stream_output

True

network_mode

None

`container_create_kwargs`is

None

Marvin

08/27/2025, 4:11 PM

✅ thought for 139.0 seconds

Marvin

08/27/2025, 4:13 PM

Got it — thanks for the specifics. What your setup implies - Per-deployment

concurrency_limit = 1

is fine and explains the “AwaitingConcurrencySlot” you’re seeing when a run starts late; the next scheduled run for that same deployment must wait. - Since the same image worked on Prefect 1, the extra startup work in Prefect 3 (runner + child process + more state transitions) is likely what pushes you over the edge when multiple minute-level runs start at once on a 3 vCPU VM. Changes I recommend trying (in this order) 1) Cap parallel starts at the work pool - This prevents a CPU/disk storm during container cold starts. - Start conservatively at 2 or 3 and adjust after measuring:

Copy code

prefect work-pool set-concurrency-limit -p YOUR_POOL 2
prefect work-pool inspect -p YOUR_POOL

This will reduce late starts and, in turn, reduce how often “AwaitingConcurrencySlot” piles up. 2) Make container startup do the minimum possible - Explicitly set Docker

image_pull_policy

if_not_present

(it’s safer than leaving it

None

): - In your deployment:

Copy code

deployment = flow.to_deployment(
    name="unique-flow-name",
    schedules=[Schedule(cron="* * * *")],
    work_pool_name="YOUR_POOL",
    job_variables={
        "image_pull_policy": "if_not_present",
        "stream_output": True
    }
)

- Or update the work pool job variables via UI and set

image_pull_policy: if_not_present

. - Optional if your Prefect API is on the same host: set

network_mode: "host"

in job variables to remove bridge/DNS overhead:

Copy code

job_variables={"network_mode": "host"}

3) Give Prefect more head-start to hit the scheduled time - Increase worker prefetch window so the container/child process can be fully warm by the scheduled minute:

Copy code

export PREFECT_WORKER_PREFETCH_SECONDS=45
# then restart your worker

Default is 10s; 30–60s often works well for minute-level schedules. 4) Stagger minute schedules a bit - Offset cron by a few seconds each so you aren’t launching 5–6 containers at :00. This alone can remove 30–60s of contention. 5) Consider a process work pool for the tiniest, high-frequency flows - Process runs don’t pay the Docker cold start and usually begin immediately. Keeping Docker for heavier or less frequent flows and moving 1–2 minute-level flows to a process pool often normalizes the system. 6) Trim import overhead - Move heavy imports (e.g., pandas, big SDKs) inside tasks/functions rather than at module import time. - That cuts time between “Opening process…” and “Running set_working_directory step…”. Sanity/diagnostic checks - Verify there aren’t hidden shared limits:

Copy code

prefect concurrency-limit ls
prefect global-concurrency-limit ls
prefect work-pool inspect -p YOUR_POOL

- Enable DEBUG logs to see earlier lifecycle events:

Copy code

export PREFECT_LOGGING_LEVEL=DEBUG

- Profile imports inside the container for a single run: - Add

PYTHONPROFILEIMPORTTIME=1

in job variables

env

for one deployment; compare where the seconds go. - Watch host contention when a bunch of minute runs start:

htop

docker stats

iostat

during :00 spikes will confirm CPU and disk pressure. Why this helps - Your per-deployment limit of 1 is fine; it just exposes lateness. The biggest wins here come from: - Reducing how many containers try to cold-start at the same instant (pool cap + schedule staggering) - Ensuring those containers don’t do extra work (pull policy, network setup) - Starting them far enough ahead (prefetch window) to hit the schedule If you can share: - One delayed run ID (so we can confirm in

flow-run inspect

which limit it’s waiting on) - Your pool name and whether your API is on the same host (to decide on

network_mode: "host"

) …I can tailor exact job_variables and environment settings for your pool and deployments.

Brendan Dalpe

08/27/2025, 5:01 PM

@Vincent Chéry You mentioned Prefect 3, but which version specifically?

Vincent Chéry

08/29/2025, 7:03 AM

Hi @Brendan Dalpe It's Prefect 3.4.7

Vincent Chéry

08/29/2025, 9:34 AM

@Marvin How do I offset my cron schedule by a few seconds after :00 ?

Marvin

08/29/2025, 9:34 AM

✅ thought for 133.5 seconds

Marvin

08/29/2025, 9:36 AM

Short answer: you can’t with cron. In Prefect 3.x, cron schedules are minute-precision (5-field cron) and do not support seconds. To run a few seconds after :00, use either an RRule or an Interval schedule. Options: - RRule (supports seconds via BYSECOND) - Every hour at 0005:

Copy code

from prefect import flow
  from prefect.schedules import RRule

  @flow
  def my_flow():
      ...

  schedule = RRule(
      rrule="FREQ=HOURLY;BYMINUTE=0;BYSECOND=5",
      timezone="UTC",
  )

  my_flow.deploy(
      name="hourly-5s-offset",
      schedules=[schedule],
  )

- Example for weekdays at 090005:

Copy code

RRule(rrule="FREQ=WEEKLY;BYDAY=MO,TU,WE,TH,FR;BYHOUR=9;BYMINUTE=0;BYSECOND=5", timezone="UTC")

- Interval with an anchor_date (to align to 0005) - Run every hour, aligned to 5 seconds after the hour:

Copy code

from datetime import datetime, timedelta, timezone
  from prefect.schedules import Interval

  schedule = Interval(
      interval=timedelta(hours=1),
      anchor_date=datetime(2025, 1, 1, 0, 0, 5, tzinfo=timezone.utc),
  )

Notes: - Worker polling adds some jitter; a small offset (like 5s) may still start a bit later depending on your worker’s poll interval. If avoiding top-of-minute contention is the goal, consider a slightly larger offset (e.g., 15–30s). - As a last resort with cron, you could add a tiny sleep at the start of your flow, but scheduling via RRule/Interval is cleaner:

Copy code

import anyio
  from prefect import flow

  @flow
  async def my_flow():
      await anyio.sleep(5)
      ...

Docs for reference: - Schedules overview: Prefect 3.x schedules - Creating deployment schedules: Create schedules for deployments If you’re configuring this via the UI or prefect.yaml instead of code, let me know and I can share the exact config snippet. What cadence and timezone do you need?

Vincent Chéry

08/29/2025, 9:41 AM

@Marvin How can I schedule a flow to run every minute at 5 seconds past the minute with RRule ?

Marvin

08/29/2025, 9:41 AM

✅ thought for 24.1 seconds

Marvin

08/29/2025, 9:41 AM

Use an RRule with FREQ=MINUTELY and BYSECOND=5. Example (Prefect 3.x):

Copy code

from prefect import flow
from prefect.schedules import RRule

@flow
def my_flow():
    print("Running at 5 seconds past the minute")

schedule = RRule(
    rrule="FREQ=MINUTELY;BYSECOND=5",
    timezone="UTC",  # change if needed
)

my_flow.deploy(
    name="minutely-5s-offset",
    schedules=[schedule],
)

Notes: - Workers poll on an interval, so you may see a small delay beyond 5s. If you need a larger buffer, consider 10–15s. - Docs: Schedules and Create schedules for deployments

Open in Slack

Previous Next