itielOlenick
09/11/2025, 3:27 PMMarvin
09/11/2025, 3:27 PMMarvin
09/11/2025, 3:38 PM# Use PgBouncer as the DSN
PREFECT_SERVER_DATABASE_CONNECTION_URL=<postgresql+asyncpg://user:pass@pgbouncer:6432/db>
# Keep per-process pool small; let PgBouncer do the heavy lifting
PREFECT_SERVER_DATABASE_SQLALCHEMY_POOL_SIZE=10
PREFECT_SERVER_DATABASE_SQLALCHEMY_MAX_OVERFLOW=20
PREFECT_SERVER_DATABASE_SQLALCHEMY_POOL_TIMEOUT=60
PREFECT_SERVER_DATABASE_CONNECTION_TIMEOUT=5
# With PgBouncer transaction pooling, disable prepared statement caches
PREFECT_SERVER_DATABASE_SQLALCHEMY_CONNECT_ARGS__STATEMENT_CACHE_SIZE=0
PREFECT_SERVER_DATABASE_SQLALCHEMY_CONNECT_ARGS__PREPARED_STATEMENT_CACHE_SIZE=0
PREFECT_SERVER_DATABASE_SQLALCHEMY_CONNECT_ARGS__APPLICATION_NAME=prefect-api
Docs for these settings:
- Server DB configurations
- Settings reference
2) Configure PgBouncer sanely
- Keep max server connections to Postgres reasonable (e.g., 100–200 total), with many client connections allowed.
- Typical snippet:
[pgbouncer]
pool_mode = transaction
max_client_conn = 8000
default_pool_size = 100
reserve_pool_size = 20
server_reset_query = DISCARD ALL
ignore_startup_parameters = extra_float_digits
- Ensure Postgres max_connections is set to cover PgBouncer’s server-side needs, not the client-side count.
3) Run one set of background services
- If you’ve scaled your API deployment to N replicas, run a single replica with services or start N replicas with “just the webserver” and one dedicated services instance.
- You can start a webserver without services like:
prefect server start --no-services
Then run services in a single pod (or keep a single full “server start” pod). CLI help for confirmation: prefect server start --help
.
4) Scale API replicas for HTTP concurrency, not DB pool size
- Add more API pods behind PgBouncer and load-balancer. Keep each pod’s pool modest. This gives you more request concurrency without exploding Postgres backends per process.
5) Reduce noisy writes if possible
- Very chatty logging multiplies DB writes. If logs are extremely verbose, consider reducing log volume or temporarily disabling logging-to-API during stress tests:
```Marvin
09/11/2025, 3:38 PMitielOlenick
09/11/2025, 3:45 PMMarvin
09/11/2025, 3:46 PMMarvin
09/11/2025, 3:50 PM[pgbouncer]
pool_mode = transaction
max_client_conn = 8000
default_pool_size = 100
reserve_pool_size = 20
server_reset_query = DISCARD ALL
ignore_startup_parameters = extra_float_digits
- Postgres max_connections should cover PgBouncer’s server-side pool needs (not client-side max). Monitor CPU, locks, checkpointing, and autovacuum.
Prefect Server/API
- Scale horizontally with multiple API replicas behind PgBouncer. Keep per-pod DB pools small.
- Run exactly one copy of Prefect “services” (scheduler, event persister, task_run_recorder, etc.). If every API replica runs services, DB connections and contention multiply rapidly.
- Recommended SQLAlchemy/prefect DB settings for API pods:
# Use PgBouncer DSN
PREFECT_SERVER_DATABASE_CONNECTION_URL=<postgresql+asyncpg://user:pass@pgbouncer:6432/db>
# Keep pool small per process; let PgBouncer handle concurrency
PREFECT_SERVER_DATABASE_SQLALCHEMY_POOL_SIZE=10
PREFECT_SERVER_DATABASE_SQLALCHEMY_MAX_OVERFLOW=20
PREFECT_SERVER_DATABASE_SQLALCHEMY_POOL_TIMEOUT=60
PREFECT_SERVER_DATABASE_CONNECTION_TIMEOUT=5
# Critical with PgBouncer transaction pooling:
PREFECT_SERVER_DATABASE_SQLALCHEMY_CONNECT_ARGS__STATEMENT_CACHE_SIZE=0
PREFECT_SERVER_DATABASE_SQLALCHEMY_CONNECT_ARGS__PREPARED_STATEMENT_CACHE_SIZE=0
# Optional but handy for tracing in pg_stat_activity
PREFECT_SERVER_DATABASE_SQLALCHEMY_CONNECT_ARGS__APPLICATION_NAME=prefect-api
- If logs are very chatty, that alone can saturate Postgres at high concurrency. Consider reducing log volume or temporarily disabling logs-to-API to test throughput:
PREFECT_LOGGING_TO_API_ENABLED=false
Why you only see ~1 CPU used by the server
- By default, prefect server start runs a single Uvicorn worker process with async concurrency. One worker process ≈ one CPU core utilized. Setting K8s CPU requests to 2 just reserves capacity — it does not create multi-process server workers.
- Prefect’s CLI doesn’t expose a “workers” argument. The recommended way to utilize multiple cores is to run multiple API replicas (pods) behind your Service/Ingress. This also gives you resilience and better horizontal scaling.
Using the Helm chart
- Scale API replicas: set the chart’s replica count (e.g., server.replicaCount) to multiple pods for the webserver.
- Run services exactly once:
- Option A: Start a single pod that runs everything (default) and N “web-only” pods by passing --no-services to those. Many chart versions expose extra args (e.g., server.extraArgs). If your chart does not, override the container args/command for that deployment.
- Option B: Deploy two Helm releases: one release with replicas: 1 (services ON), and a second release with replicas: N passing --no-services (web-only) for horizontal API scaling.
Example values snippet (adjust to your chart’s keys)
```
server:
replicaCount: 4 # scale webserver horizontally
extraArgs:
- --no-services # this deployment runs only the webserver
env:
- name: PREFECT_SERVER_DATABASE_CONNECTION_URL
valueFrom:
secretKeyRef:
name: pgbouncer-dsn
key: dsnMarvin
09/11/2025, 3:50 PM