< Marvin> we are running a prefect deployment in k8s and are Prefect Community #ask-community

<@ULVA73B9P> we are running a prefect deployment i...

itielOlenick

09/11/2025, 3:27 PM

@Marvin we are running a prefect deployment in k8s and are hitting issues when going over 200 parallel runs. Running latest version 3.4.17. also tried 3.4.1. We are using a postgres database. When adding pgbouncer we configured the pool size to 200 and 300. Hitting sqlalchemy timeouts. After that we tried increasing the pool size to 2000. Increased the pool size on pgbouncer to 3000. Same issues. We are seeing around 700 connections. We tried connecting directly to the db. 300 parallel jobs ran. Going beyond 1000 we ran into maxing out our db, not a small one - 64gb me more and 8 vcpu What is going on?

Marvin

09/11/2025, 3:27 PM

✅ thought for 632.0 seconds

Marvin

09/11/2025, 3:38 PM

Short answer: you’re not hitting a “Prefect limit” at ~200–300; you’re running into database connection pool behavior and DB CPU/lock contention under heavy load. Increasing pool sizes into the thousands makes it worse by flooding Postgres with more concurrent backends. The right fix is to keep per-process pools small, use PgBouncer correctly (transaction pooling with prepared statement caches disabled), and scale API replicas while running a single copy of background services. What’s likely happening - Each Prefect API process has its own SQLAlchemy pool. By default it’s small (pool=5, overflow=10) but if you increased it across many pods/processes, total DB connections explode (you’re seeing ~700). - If PgBouncer is in transaction pooling mode and asyncpg’s prepared statement caches are on, you’ll see timeouts and weird behavior because prepared statements are not compatible with transaction pooling unless you disable the caches. - If you scaled the API to multiple replicas and let every replica run background services (scheduler, event persister, task_run_recorder, etc.), each replica holds additional long-lived DB connections and all of them compete for the same hot tables, increasing contention and timeouts. - At 1000+ parallel runs, Postgres CPU becomes the bottleneck (8 vCPU is often the limiting factor for this write-heavy orchestration workload), regardless of RAM. What to change 1) Point Prefect to PgBouncer and keep per-process pools small - Use PgBouncer in transaction pooling mode and let it multiplex connections. Keep the API’s SQLAlchemy pool small-ish rather than huge. - Disable prepared statement caches in asyncpg (critical with PgBouncer transaction pooling). - Example env for the Prefect API pods:

Copy code

# Use PgBouncer as the DSN
PREFECT_SERVER_DATABASE_CONNECTION_URL=<postgresql+asyncpg://user:pass@pgbouncer:6432/db>

# Keep per-process pool small; let PgBouncer do the heavy lifting
PREFECT_SERVER_DATABASE_SQLALCHEMY_POOL_SIZE=10
PREFECT_SERVER_DATABASE_SQLALCHEMY_MAX_OVERFLOW=20
PREFECT_SERVER_DATABASE_SQLALCHEMY_POOL_TIMEOUT=60
PREFECT_SERVER_DATABASE_CONNECTION_TIMEOUT=5

# With PgBouncer transaction pooling, disable prepared statement caches
PREFECT_SERVER_DATABASE_SQLALCHEMY_CONNECT_ARGS__STATEMENT_CACHE_SIZE=0
PREFECT_SERVER_DATABASE_SQLALCHEMY_CONNECT_ARGS__PREPARED_STATEMENT_CACHE_SIZE=0
PREFECT_SERVER_DATABASE_SQLALCHEMY_CONNECT_ARGS__APPLICATION_NAME=prefect-api

Docs for these settings: - Server DB configurations - Settings reference 2) Configure PgBouncer sanely - Keep max server connections to Postgres reasonable (e.g., 100–200 total), with many client connections allowed. - Typical snippet:

Copy code

[pgbouncer]
pool_mode = transaction
max_client_conn = 8000
default_pool_size = 100
reserve_pool_size = 20
server_reset_query = DISCARD ALL
ignore_startup_parameters = extra_float_digits

- Ensure Postgres max_connections is set to cover PgBouncer’s server-side needs, not the client-side count. 3) Run one set of background services - If you’ve scaled your API deployment to N replicas, run a single replica with services or start N replicas with “just the webserver” and one dedicated services instance. - You can start a webserver without services like:

Copy code

prefect server start --no-services

Then run services in a single pod (or keep a single full “server start” pod). CLI help for confirmation:

prefect server start --help

. 4) Scale API replicas for HTTP concurrency, not DB pool size - Add more API pods behind PgBouncer and load-balancer. Keep each pod’s pool modest. This gives you more request concurrency without exploding Postgres backends per process. 5) Reduce noisy writes if possible - Very chatty logging multiplies DB writes. If logs are extremely verbose, consider reducing log volume or temporarily disabling logging-to-API during stress tests: ```

Marvin

09/11/2025, 3:38 PM

PREFECT_LOGGING_TO_API_ENABLED=false ``` - You can also increase the pool timeout slightly to reduce pool starvation under bursts. 6) Expect Postgres CPU to be the long pole at 1000+ parallel runs - Orchestration is write-heavy. 8 vCPU will saturate quickly at that load even with pooling fixed. If you need stable 1000+ concurrent runs, plan on more CPU (e.g., 16–32 vCPU) and standard Postgres tuning. - Helpful background on why “more connections” != “more throughput”: More memory, more problems Quick checks to confirm the root cause - Look for “QueuePool limit of size X overflow Y reached, connection timed out” in API logs — that’s pool exhaustion (not Postgres refusal). - Check PgBouncer stats (SHOW POOLS/SHOW STATS) to confirm transaction pooling and that server-side connections are stable while client connections spike. - Verify only one copy of services is running. - Verify prepared statement caches are disabled in Prefect’s connect args when using PgBouncer transaction pooling. A few clarifying questions to help us tune this precisely - How many API replicas are you running in K8s? Are services running on each replica or just one? - What PgBouncer pool_mode are you using (transaction vs session)? Can you share your PgBouncer config for default_pool_size, max_client_conn, and server_reset_query? - Can you share the exact SQLAlchemy timeout error message from the API logs? - Roughly how verbose are your flow run logs, and are you writing many artifacts per run? - Are you using the Prefect Helm chart, and if so, which values are set for services and replicas? Once you share those, I can suggest concrete numbers for pool sizes, PgBouncer, and replica counts for your target concurrency. References - Self-hosted Prefect 3 - Server DB configurations - Settings reference - More memory, more problems

itielOlenick

09/11/2025, 3:45 PM

@Marvin I'll get that info In the meanwhile, what do you recommend for a prefect deployment that should be able to run 1000+ parallel runs? Database wise and server wise Also, why does the server use only 1 CPU when we defined the requests to 2 and no limits on k8s? We are using the helm chart.

Marvin

09/11/2025, 3:46 PM

✅ thought for 233.8 seconds

Marvin

09/11/2025, 3:50 PM

Great questions — here’s a blueprint we’ve seen work reliably for 1k+ concurrent runs, plus why you’re seeing a single CPU used. Target architecture for 1,000+ parallel runs Database (Postgres + PgBouncer) - Use PgBouncer in transaction pooling mode and let it multiplex connections. Keep per-process SQLAlchemy pools small; do not crank them into the hundreds/thousands. - Disable asyncpg prepared statement caches when using transaction pooling; otherwise you’ll hit timeouts/odd behavior with PgBouncer. - Right-size Postgres for CPU: orchestration is write-heavy — plan for 16–32 vCPU for stable 1k+ concurrency. RAM helps caching, but CPU is typically the bottleneck here. - Keep server-side connections to Postgres moderate (e.g., 100–200), and allow many client connections to PgBouncer (e.g., thousands). - Example PgBouncer config:

Copy code

[pgbouncer]
pool_mode = transaction
max_client_conn = 8000
default_pool_size = 100
reserve_pool_size = 20
server_reset_query = DISCARD ALL
ignore_startup_parameters = extra_float_digits

- Postgres max_connections should cover PgBouncer’s server-side pool needs (not client-side max). Monitor CPU, locks, checkpointing, and autovacuum. Prefect Server/API - Scale horizontally with multiple API replicas behind PgBouncer. Keep per-pod DB pools small. - Run exactly one copy of Prefect “services” (scheduler, event persister, task_run_recorder, etc.). If every API replica runs services, DB connections and contention multiply rapidly. - Recommended SQLAlchemy/prefect DB settings for API pods:

Copy code

# Use PgBouncer DSN
PREFECT_SERVER_DATABASE_CONNECTION_URL=<postgresql+asyncpg://user:pass@pgbouncer:6432/db>

# Keep pool small per process; let PgBouncer handle concurrency
PREFECT_SERVER_DATABASE_SQLALCHEMY_POOL_SIZE=10
PREFECT_SERVER_DATABASE_SQLALCHEMY_MAX_OVERFLOW=20
PREFECT_SERVER_DATABASE_SQLALCHEMY_POOL_TIMEOUT=60
PREFECT_SERVER_DATABASE_CONNECTION_TIMEOUT=5

# Critical with PgBouncer transaction pooling:
PREFECT_SERVER_DATABASE_SQLALCHEMY_CONNECT_ARGS__STATEMENT_CACHE_SIZE=0
PREFECT_SERVER_DATABASE_SQLALCHEMY_CONNECT_ARGS__PREPARED_STATEMENT_CACHE_SIZE=0

# Optional but handy for tracing in pg_stat_activity
PREFECT_SERVER_DATABASE_SQLALCHEMY_CONNECT_ARGS__APPLICATION_NAME=prefect-api

- If logs are very chatty, that alone can saturate Postgres at high concurrency. Consider reducing log volume or temporarily disabling logs-to-API to test throughput:

Copy code

PREFECT_LOGGING_TO_API_ENABLED=false

Why you only see ~1 CPU used by the server - By default, prefect server start runs a single Uvicorn worker process with async concurrency. One worker process ≈ one CPU core utilized. Setting K8s CPU requests to 2 just reserves capacity — it does not create multi-process server workers. - Prefect’s CLI doesn’t expose a “workers” argument. The recommended way to utilize multiple cores is to run multiple API replicas (pods) behind your Service/Ingress. This also gives you resilience and better horizontal scaling. Using the Helm chart - Scale API replicas: set the chart’s replica count (e.g., server.replicaCount) to multiple pods for the webserver. - Run services exactly once: - Option A: Start a single pod that runs everything (default) and N “web-only” pods by passing --no-services to those. Many chart versions expose extra args (e.g., server.extraArgs). If your chart does not, override the container args/command for that deployment. - Option B: Deploy two Helm releases: one release with replicas: 1 (services ON), and a second release with replicas: N passing --no-services (web-only) for horizontal API scaling. Example values snippet (adjust to your chart’s keys) ``` server: replicaCount: 4 # scale webserver horizontally extraArgs: - --no-services # this deployment runs only the webserver env: - name: PREFECT_SERVER_DATABASE_CONNECTION_URL valueFrom: secretKeyRef: name: pgbouncer-dsn key: dsn

Marvin

09/11/2025, 3:50 PM

- name: PREFECT_SERVER_DATABASE_SQLALCHEMY_POOL_SIZE value: "10" - name: PREFECT_SERVER_DATABASE_SQLALCHEMY_MAX_OVERFLOW value: "20" - name: PREFECT_SERVER_DATABASE_SQLALCHEMY_POOL_TIMEOUT value: "60" - name: PREFECT_SERVER_DATABASE_CONNECTION_TIMEOUT value: "5" - name: PREFECT_SERVER_DATABASE_SQLALCHEMY_CONNECT_ARGS__STATEMENT_CACHE_SIZE value: "0" - name: PREFECT_SERVER_DATABASE_SQLALCHEMY_CONNECT_ARGS__PREPARED_STATEMENT_CACHE_SIZE value: "0" - name: PREFECT_SERVER_DATABASE_SQLALCHEMY_CONNECT_ARGS__APPLICATION_NAME value: "prefect-api" resources: requests: cpu: "500m" memory: "1Gi" limits: memory: "2Gi" ``` Then deploy one separate “services” instance (replicas: 1) without --no-services. If your chart does not split this cleanly, run a second Helm release for the services pod. Tuning checklist to reach 1k+ runs - PgBouncer: transaction pooling; prepared statement caches disabled in Prefect; moderate default_pool_size; large max_client_conn. - Prefect API: multiple replicas; small per-process SQLAlchemy pools; one services instance. - Postgres: more CPU (16–32 vCPU for 1k+); monitor contention and vacuum; keep max_connections moderate; plan IOPS headroom. - Logging/artifacts: reduce volume or batch to avoid overwhelming the DB. Helpful references - Server DB configurations: API DB config - Settings reference: Settings - Self-hosted/Helm: Self-hosted Prefect 3, Helm deployment - Why more connections don’t mean more throughput: More memory, more problems If you can share your Helm values (replicas, args), PgBouncer config, and a sample SQLAlchemy timeout error, I can translate this into exact numbers for your environment.

12 Views

Open in Slack

Previous Next