Oliver Tedder
10/02/2025, 1:04 PMMarvin
10/02/2025, 1:04 PMMarvin
10/02/2025, 1:17 PMPREFECT_SERVER_EVENTS_MAXIMUM_SIZE_BYTES=1000000         # ~1 MB in-memory event buffer cap
    PREFECT_SERVER_EVENTS_RETENTION_PERIOD=1d                # keep only 1 day in the live cache
    PREFECT_SERVER_EVENTS_STREAM_OUT_ENABLED=false           # disable live push if you can live without it
    PREFECT_SERVER_EVENTS_WEBSOCKET_BACKFILL_PAGE_SIZE=100   # reduce backfill chunking
    PREFECT_SERVER_EVENTS_MAXIMUM_WEBSOCKET_BACKFILL=2m      # trim backfill time window
    Settings reference: Settings reference
- Reduce DB connection pressure (each connection has memory cost)
  PREFECT_SERVER_DATABASE_SQLALCHEMY_POOL_SIZE=5
  PREFECT_SERVER_DATABASE_SQLALCHEMY_MAX_OVERFLOW=5
  PREFECT_SERVER_DATABASE_SQLALCHEMY_POOL_RECYCLE=1800
- If logs are very chatty, throttle log volume to the API
  - Prefer higher log levels for flows/tasks or send fewer logs to the API (lots of logs require buffering and DB writes by the server).
- Enable server metrics so you can correlate memory with events/logs/load
  PREFECT_SERVER_METRICS_ENABLED=true
  Then scrape /metrics with Prometheus/Grafana for memory and connection metrics.
Scale and process isolation (recommended)
- Recent releases added support to vertically scale the server with workers for better isolation and resilience; upgrading allows you to run multiple web workers and use worker recycling to avoid cumulative memory growth over very long uptimes.
  - See “Vertical scale prefect server with workers” in the release notes: 3.4.20 changelog
  - After upgrading, configure multiple API workers and request/connection limits as documented to keep per‑process memory bounded.
Operational guardrails
- If you run the server under systemd or k8s, add guardrails to avoid ever hitting 20GB:
  - systemd example:
    MemoryMax=8G
    Restart=always
  - Kubernetes example: set memory limits/requests and a liveness probe so the pod gets restarted before memory becomes problematic.
- Avoid keeping many UI tabs open on “Live” pages for long periods (each maintains websockets and subscriptions).
Why this helps
- The defaults favor a responsive UI with days of live event backfill and generous buffering; under steady or bursty load, the in‑memory event/log caches and active websocket backfills can steadily grow the process RSS.
- Tightening the events cache, reducing websocket backfill size, and capping DB pool size prevents accumulation.
- Multiple API workers and recycling further prevent slow, monotonic RSS growth in a single long‑lived process.
Useful links
- Self-hosting overview: Self-hosted Prefect serverMarvin
10/02/2025, 1:17 PM