<@ULVA73B9P> I would like to know all the db confi...
# ask-marvin
a
@Marvin I would like to know all the db config that is availabel in prefect 2. My prefect 2 is in aws and also db is aws RDS. At a time 200-300 flows are executing in 4 Prefect server replica. What would be the ideal db connection pool config.
m
thought for 544.6 seconds
Thanks for the details — since you’re on Prefect 2.x (Orion), here’s what you can configure for the API/database and some guidance for sizing with AWS RDS at your scale. Prefect 2.x database settings (env vars) You can see these on your system with:
Copy code
prefect config view --show-defaults
Key database-related settings: - PREFECT_API_DATABASE_CONNECTION_URL - Default: sqlite+aiosqlite:///${PREFECT_HOME}/prefect.db - Set this to your Postgres DSN, e.g.
<postgresql+asyncpg://USER:PASS@HOST:5432/DBNAME>
- PREFECT_API_DATABASE_DRIVER - Optional; inferred from the URL (e.g.,
postgresql+asyncpg
or
sqlite+aiosqlite
) - PREFECT_API_DATABASE_HOST / PORT / USER / NAME / PASSWORD - Optional; if provided, Prefect will build the connection URL for you - PREFECT_API_DATABASE_ECHO - Default: False; set True to log SQL (debugging only) - PREFECT_API_DATABASE_MIGRATE_ON_START - Default: True; runs migrations at API start - PREFECT_API_DATABASE_CONNECTION_TIMEOUT - Default: 5.0 seconds; timeout for establishing new DB connections - PREFECT_API_DATABASE_TIMEOUT - Default: 10.0 seconds; general DB operation timeout Notes - For Postgres, Prefect 2.x uses SQLAlchemy + asyncpg; ensure the
pg_trgm
extension is available on your DB if you rely on searchable artifacts. - The Prefect 2.x API does not expose SQLAlchemy pool controls (e.g., pool_size, max_overflow, pool_timeout, pool_recycle) as environment variables. Docs - Self-hosted/DB guidance: Prefect 2 hosting guide What to use for connection pooling (200–300 concurrent flows, 4 API replicas) Because application-level pool tuning is not exposed in 2.x, the typical and most reliable path at your scale is to add PgBouncer in front of RDS and let Prefect’s default pool use PgBouncer. Recommended starting point - PgBouncer (transaction pooling mode): - default_pool_size: 50 (per DB/user) - reserve_pool_size: 20 - max_client_conn: size for expected bursts (e.g., 1000–2000) - server_idle_timeout: 300s - server_reset_query_always: 1 (for safety with long-lived connections) - RDS Postgres: - Ensure max_connections comfortably exceeds PgBouncer’s server-side pools plus any other app usage. - Enable TCP keepalives (RDS defaults are usually fine). - Consider a modest statement_timeout (e.g., 60s–120s) in the parameter group if you want to bound runaway queries, but confirm it doesn’t interrupt valid long-running operations. - Prefect API settings: - Keep the defaults for pooling (they’re internal), but bump connection timeouts to reduce flakiness during bursts: -
PREFECT_API_DATABASE_CONNECTION_TIMEOUT=10.0
or 15.0 -
PREFECT_API_DATABASE_TIMEOUT=30.0
(if you’ve observed slow queries at peak) - TLS: - If your RDS instance requires TLS, ensure your DSN/driver enforces SSL (for asyncpg, TLS can be provided via connect args; otherwise ensure DB-level “require SSL” and security group rules). If you need help wiring TLS for asyncpg specifically, I can outline options based on your current connection method. Back-of-the-envelope capacity check - With 4 API replicas and Prefect’s default async SQLAlchemy pool (roughly ~5 base + ~10 overflow per process is typical for SQLAlchemy’s QueuePool defaults), you’re looking at a few dozen DB connections under load from Prefect itself. - PgBouncer will smooth bursts when 200–300 flows transition states at once (scheduler, task recorder, logging, etc.) so the DB isn’t overwhelmed by many short-lived connections. If you cannot add PgBouncer - It’s still workable at smaller scales, but at your concurrency you may see connect timeouts during bursts. In that case: - Increase
PREFECT_API_DATABASE_CONNECTION_TIMEOUT
to 10–15s - Ensure RDS max_connections is sufficient for peak bursts plus other workloads - Scale up the RDS instance class if you observe high CPU/IO/lock contention A few questions to tune more precisely - Which exact Prefect 2.x version are you on?
- Are your 4 “server replicas” single-process uvicorn instances, or do they run multiple workers per pod/container? - Do you already run PgBouncer (or RDS Proxy) in front of RDS? - Are you currently seeing DB-related errors (timeouts, “too many connections”, lock waits)? If so, can you share a snippet? I’m happy to translate this into concrete Kubernetes/Helm env vars or Docker compose changes once I know your deployment layout.
a
@Marvin My rds config is this: LEAST({DBInstanceClassMemory/9531392},5000) I will increase the timeout to 20 sec. I am not using pgbouncer. Would that work?
m
thought for 23.5 seconds
Short answer: probably yes — bumping the DB timeouts can smooth over transient spikes, and with 4 Prefect 2 API replicas you’re unlikely to exhaust RDS connections if you’re using the default AWS formula. But keep an eye on latency under load. A few specifics: - Max connections: RDS’s default formula LEAST(DBInstanceClassMemory/9531392, 5000) generally yields plenty of headroom. Prefect 2’s API uses a modest pool per replica, so you’ll typically see on the order of a few dozen to ~100 total DB connections across 4 replicas. - Timeouts: Increasing to 20s is reasonable. I’d suggest: -
PREFECT_API_DATABASE_CONNECTION_TIMEOUT=10-15
(connection establishment) -
PREFECT_API_DATABASE_TIMEOUT=20
(query/operation timeout) - Without PgBouncer: This is workable at your scale, but you may see occasional spikes (connection setup or query latency) during heavy scheduling or logging bursts. If you start seeing “connection timeout” or slow queries at peak, consider RDS Proxy or PgBouncer later to reduce connection churn. Quick checklist: - Set the env vars on your API pods:
Copy code
PREFECT_API_DATABASE_CONNECTION_TIMEOUT=15
  PREFECT_API_DATABASE_TIMEOUT=20
- Monitor RDS CloudWatch: CPUUtilization, DatabaseConnections (and MaxUsed), Read/WriteIOPS, DiskQueueDepth, FreeableMemory, and Lock/Deadlock counts. - Optional DB guardrails: consider
statement_timeout
(e.g., 60–120s) and
idle_in_transaction_session_timeout
to prevent stuck sessions. Make sure these don’t interrupt legitimate long-running operations in your environment. One clarification: which “timeout” are you planning to set to 20s —
PREFECT_API_DATABASE_TIMEOUT
,
PREFECT_API_DATABASE_CONNECTION_TIMEOUT
, or both? I can tailor the recommendation based on that. Reference: Prefect 2 (Orion) self-hosting guide