< Marvin> How can I clean up my Prefect postgrestsql db as i Prefect Community #ask-marvin

<@ULVA73B9P> How can I clean up my Prefect postgre...

Patricio Navarro

08/19/2025, 10:44 AM

@Marvin How can I clean up my Prefect postgrestsql db , as it's reaching the max size limit? it's a default procedure?

Marvin

08/19/2025, 10:44 AM

✅ thought for 129.6 seconds

Marvin

08/19/2025, 10:46 AM

Short answer: there isn’t a “default” automatic cleanup for a self‑hosted Prefect PostgreSQL database. You set retention and prune data yourself. Prefect Cloud has its own retention; self‑hosted does not. If you’re on self‑hosted (PostgreSQL), here’s the sane way to bring size back under control: 1) Find what’s actually big Run this to see the top tables. It’ll tell you where to focus (it’s usually events, logs, and sometimes artifacts; completed runs can pile up too).

Copy code

SELECT
  relname AS table_name,
  pg_size_pretty(pg_total_relation_size(relid)) AS total_size,
  pg_size_pretty(pg_relation_size(relid)) AS table_size,
  pg_size_pretty(pg_total_relation_size(relid) - pg_relation_size(relid)) AS index_size
FROM pg_catalog.pg_statio_user_tables
ORDER BY pg_total_relation_size(relid) DESC
LIMIT 20;

2) Prune high‑churn data first - Events: keep the last N days/weeks. - Logs: keep the last N days/weeks. - Artifacts: if you don’t need historical artifacts, prune older ones. - Old completed flow/task runs: if you’re comfortable removing historic runs, delete runs older than your retention target. Do it in batches to avoid long locks. Example pattern (adjust table/column names based on your schema; preview with SELECT first):

Copy code

-- Preview what you'd delete (adjust table/column names)
SELECT count(*) FROM events WHERE occurred < NOW() - INTERVAL '30 days';

-- Batch delete to limit lock time
WITH doomed AS (
  SELECT id
  FROM events
  WHERE occurred < NOW() - INTERVAL '30 days'
  LIMIT 10000
)
DELETE FROM events e
USING doomed
WHERE e.id = doomed.id;

Repeat the batch until count hits zero. Use the same pattern for logs/artifacts/old runs (with their date columns). If you prefer a blunt instrument and can afford losing old telemetry, you can TRUNCATE those tables instead, but that’s… blunt. 3) Reclaim space After large deletes, reclaim storage: - Minimal downtime: use pg_repack (recommended). - With a lock: VACUUM (FULL) then ANALYZE. VACUUM FULL reclaims space but requires an exclusive lock; pg_repack avoids that. 4) Set a retention policy so it doesn’t happen again Decide how much history you actually need and schedule a regular job (cron/K8s CronJob) to prune: - Events older than N days - Logs older than N days - Completed runs older than N days - Optionally, artifacts older than N days 5) If you really want a clean slate There are admin endpoints to clear or drop the database if a full reset is acceptable for your environment: - Admin: clear database - Admin: drop database Obviously, these are for “start over” scenarios. Helpful reading: - Database maintenance and size management If you share: - Prefect version (assuming 3.x unless you say otherwise), - whether you’re on Cloud or self-hosted, - and the top 5 largest tables from step 1, …I’ll give you exact delete statements tailored to your schema and a simple scheduled cleanup script. Otherwise, I can guess; but you probably don’t want me guessing at your database.

Patricio Navarro

08/19/2025, 12:44 PM

@Marvin HOw may I backup the entire postgrestdb?

Marvin

08/19/2025, 12:45 PM

✅ thought for 38.1 seconds

Marvin

08/19/2025, 12:45 PM

Short version: use standard PostgreSQL backups. Prefect’s metadata lives in your Postgres database; there’s nothing Prefect‑specific about backing it up. Quick, safe logical backup (recommended) - Single database (likely named “prefect”):

Copy code

# Set connection info or pass -h/-p/-U/-d explicitly
export PGHOST=<host> PGPORT=<port> PGUSER=<user> PGDATABASE=prefect
# Compressed, portable dump
pg_dump -Fc -Z 9 -f prefect_$(date +%F).dump

- Dump roles and other globals (needed for full restore on a new cluster):

Copy code

pg_dumpall --globals-only > globals_$(date +%F).sql

Restore

Copy code

# 1) Restore globals (roles, etc.) onto the target cluster
psql -f globals_YYYY-MM-DD.sql

# 2) Create empty DB (if needed)
createdb prefect

# 3) Restore data (parallel apply speeds this up)
pg_restore -d prefect --clean --if-exists -j 4 prefect_YYYY-MM-DD.dump

Notes - pg_dump is consistent without locking out writers; it uses MVCC snapshots. - Restore to the same or newer major PG version for smooth sailing. - Keep both the DB dump and the globals dump if you may rebuild on a fresh cluster. Physical backup (for point-in-time recovery) - Requires a replication role and streams WAL; good for very large DBs or PITR:

Copy code

pg_basebackup -h <host> -p <port> -U <replication_user> -D /path/to/backup -Ft -z -P --wal-method=stream

Container/Kubernetes examples - Docker:

Copy code

# Dump to stdout and redirect locally
docker exec -e PGPASSWORD=$POSTGRES_PASSWORD <postgres-container> \
  pg_dump -U $POSTGRES_USER -d $POSTGRES_DB -Fc -Z 9 > prefect.dump

- Kubernetes:

Copy code

kubectl exec <postgres-pod> -- bash -lc \
  'pg_dump -U $POSTGRES_USER -d $POSTGRES_DB -Fc -Z 9' > prefect.dump

Managed services - Prefer native snapshots (RDS/Cloud SQL/Aurora) for full-cluster backups; use pg_dump for logical exports. Docs, if you enjoy light bedtime reading: - pg_dump - pg_dumpall - pg_restore - pg_basebackup If you tell me: - how you’re running Postgres (Docker, K8s, managed), - DB name, and - whether you need PITR or just nightly snapshots, I’ll give you a copy‑pasteable backup + restore script for your setup.

4 Views

Open in Slack

Previous Next