Abrar Hussain
09/29/2025, 3:40 PMMarvin
09/29/2025, 3:41 PMMarvin
09/29/2025, 3:50 PMprefect config view --show-defaults
Key database-related settings:
- PREFECT_API_DATABASE_CONNECTION_URL
- Default: sqlite+aiosqlite:///${PREFECT_HOME}/prefect.db
- Set this to your Postgres DSN, e.g. <postgresql+asyncpg://USER:PASS@HOST:5432/DBNAME>
- PREFECT_API_DATABASE_DRIVER
- Optional; inferred from the URL (e.g., postgresql+asyncpg
or sqlite+aiosqlite
)
- PREFECT_API_DATABASE_HOST / PORT / USER / NAME / PASSWORD
- Optional; if provided, Prefect will build the connection URL for you
- PREFECT_API_DATABASE_ECHO
- Default: False; set True to log SQL (debugging only)
- PREFECT_API_DATABASE_MIGRATE_ON_START
- Default: True; runs migrations at API start
- PREFECT_API_DATABASE_CONNECTION_TIMEOUT
- Default: 5.0 seconds; timeout for establishing new DB connections
- PREFECT_API_DATABASE_TIMEOUT
- Default: 10.0 seconds; general DB operation timeout
Notes
- For Postgres, Prefect 2.x uses SQLAlchemy + asyncpg; ensure the pg_trgm
extension is available on your DB if you rely on searchable artifacts.
- The Prefect 2.x API does not expose SQLAlchemy pool controls (e.g., pool_size, max_overflow, pool_timeout, pool_recycle) as environment variables.
Docs
- Self-hosted/DB guidance: Prefect 2 hosting guide
What to use for connection pooling (200–300 concurrent flows, 4 API replicas)
Because application-level pool tuning is not exposed in 2.x, the typical and most reliable path at your scale is to add PgBouncer in front of RDS and let Prefect’s default pool use PgBouncer.
Recommended starting point
- PgBouncer (transaction pooling mode):
- default_pool_size: 50 (per DB/user)
- reserve_pool_size: 20
- max_client_conn: size for expected bursts (e.g., 1000–2000)
- server_idle_timeout: 300s
- server_reset_query_always: 1 (for safety with long-lived connections)
- RDS Postgres:
- Ensure max_connections comfortably exceeds PgBouncer’s server-side pools plus any other app usage.
- Enable TCP keepalives (RDS defaults are usually fine).
- Consider a modest statement_timeout (e.g., 60s–120s) in the parameter group if you want to bound runaway queries, but confirm it doesn’t interrupt valid long-running operations.
- Prefect API settings:
- Keep the defaults for pooling (they’re internal), but bump connection timeouts to reduce flakiness during bursts:
- PREFECT_API_DATABASE_CONNECTION_TIMEOUT=10.0
or 15.0
- PREFECT_API_DATABASE_TIMEOUT=30.0
(if you’ve observed slow queries at peak)
- TLS:
- If your RDS instance requires TLS, ensure your DSN/driver enforces SSL (for asyncpg, TLS can be provided via connect args; otherwise ensure DB-level “require SSL” and security group rules). If you need help wiring TLS for asyncpg specifically, I can outline options based on your current connection method.
Back-of-the-envelope capacity check
- With 4 API replicas and Prefect’s default async SQLAlchemy pool (roughly ~5 base + ~10 overflow per process is typical for SQLAlchemy’s QueuePool defaults), you’re looking at a few dozen DB connections under load from Prefect itself.
- PgBouncer will smooth bursts when 200–300 flows transition states at once (scheduler, task recorder, logging, etc.) so the DB isn’t overwhelmed by many short-lived connections.
If you cannot add PgBouncer
- It’s still workable at smaller scales, but at your concurrency you may see connect timeouts during bursts. In that case:
- Increase PREFECT_API_DATABASE_CONNECTION_TIMEOUT
to 10–15s
- Ensure RDS max_connections is sufficient for peak bursts plus other workloads
- Scale up the RDS instance class if you observe high CPU/IO/lock contention
A few questions to tune more precisely
- Which exact Prefect 2.x version are you on?Marvin
09/29/2025, 3:50 PMAbrar Hussain
09/29/2025, 3:58 PMMarvin
09/29/2025, 3:58 PMMarvin
09/29/2025, 3:59 PMPREFECT_API_DATABASE_CONNECTION_TIMEOUT=10-15
(connection establishment)
- PREFECT_API_DATABASE_TIMEOUT=20
(query/operation timeout)
- Without PgBouncer: This is workable at your scale, but you may see occasional spikes (connection setup or query latency) during heavy scheduling or logging bursts. If you start seeing “connection timeout” or slow queries at peak, consider RDS Proxy or PgBouncer later to reduce connection churn.
Quick checklist:
- Set the env vars on your API pods:
PREFECT_API_DATABASE_CONNECTION_TIMEOUT=15
PREFECT_API_DATABASE_TIMEOUT=20
- Monitor RDS CloudWatch: CPUUtilization, DatabaseConnections (and MaxUsed), Read/WriteIOPS, DiskQueueDepth, FreeableMemory, and Lock/Deadlock counts.
- Optional DB guardrails: consider statement_timeout
(e.g., 60–120s) and idle_in_transaction_session_timeout
to prevent stuck sessions. Make sure these don’t interrupt legitimate long-running operations in your environment.
One clarification: which “timeout” are you planning to set to 20s — PREFECT_API_DATABASE_TIMEOUT
, PREFECT_API_DATABASE_CONNECTION_TIMEOUT
, or both? I can tailor the recommendation based on that.
Reference: Prefect 2 (Orion) self-hosting guide