Patricio Navarro
08/19/2025, 10:44 AMMarvin
08/19/2025, 10:44 AMMarvin
08/19/2025, 10:46 AMSELECT
relname AS table_name,
pg_size_pretty(pg_total_relation_size(relid)) AS total_size,
pg_size_pretty(pg_relation_size(relid)) AS table_size,
pg_size_pretty(pg_total_relation_size(relid) - pg_relation_size(relid)) AS index_size
FROM pg_catalog.pg_statio_user_tables
ORDER BY pg_total_relation_size(relid) DESC
LIMIT 20;
2) Prune high‑churn data first
- Events: keep the last N days/weeks.
- Logs: keep the last N days/weeks.
- Artifacts: if you don’t need historical artifacts, prune older ones.
- Old completed flow/task runs: if you’re comfortable removing historic runs, delete runs older than your retention target.
Do it in batches to avoid long locks. Example pattern (adjust table/column names based on your schema; preview with SELECT first):
-- Preview what you'd delete (adjust table/column names)
SELECT count(*) FROM events WHERE occurred < NOW() - INTERVAL '30 days';
-- Batch delete to limit lock time
WITH doomed AS (
SELECT id
FROM events
WHERE occurred < NOW() - INTERVAL '30 days'
LIMIT 10000
)
DELETE FROM events e
USING doomed
WHERE e.id = doomed.id;
Repeat the batch until count hits zero. Use the same pattern for logs/artifacts/old runs (with their date columns). If you prefer a blunt instrument and can afford losing old telemetry, you can TRUNCATE those tables instead, but that’s… blunt.
3) Reclaim space
After large deletes, reclaim storage:
- Minimal downtime: use pg_repack (recommended).
- With a lock: VACUUM (FULL) then ANALYZE. VACUUM FULL reclaims space but requires an exclusive lock; pg_repack avoids that.
4) Set a retention policy so it doesn’t happen again
Decide how much history you actually need and schedule a regular job (cron/K8s CronJob) to prune:
- Events older than N days
- Logs older than N days
- Completed runs older than N days
- Optionally, artifacts older than N days
5) If you really want a clean slate
There are admin endpoints to clear or drop the database if a full reset is acceptable for your environment:
- Admin: clear database
- Admin: drop database
Obviously, these are for “start over” scenarios.
Helpful reading:
- Database maintenance and size management
If you share:
- Prefect version (assuming 3.x unless you say otherwise),
- whether you’re on Cloud or self-hosted,
- and the top 5 largest tables from step 1,
…I’ll give you exact delete statements tailored to your schema and a simple scheduled cleanup script. Otherwise, I can guess; but you probably don’t want me guessing at your database.Patricio Navarro
08/19/2025, 12:44 PMMarvin
08/19/2025, 12:45 PMMarvin
08/19/2025, 12:45 PM# Set connection info or pass -h/-p/-U/-d explicitly
export PGHOST=<host> PGPORT=<port> PGUSER=<user> PGDATABASE=prefect
# Compressed, portable dump
pg_dump -Fc -Z 9 -f prefect_$(date +%F).dump
- Dump roles and other globals (needed for full restore on a new cluster):
pg_dumpall --globals-only > globals_$(date +%F).sql
Restore
# 1) Restore globals (roles, etc.) onto the target cluster
psql -f globals_YYYY-MM-DD.sql
# 2) Create empty DB (if needed)
createdb prefect
# 3) Restore data (parallel apply speeds this up)
pg_restore -d prefect --clean --if-exists -j 4 prefect_YYYY-MM-DD.dump
Notes
- pg_dump is consistent without locking out writers; it uses MVCC snapshots.
- Restore to the same or newer major PG version for smooth sailing.
- Keep both the DB dump and the globals dump if you may rebuild on a fresh cluster.
Physical backup (for point-in-time recovery)
- Requires a replication role and streams WAL; good for very large DBs or PITR:
pg_basebackup -h <host> -p <port> -U <replication_user> -D /path/to/backup -Ft -z -P --wal-method=stream
Container/Kubernetes examples
- Docker:
# Dump to stdout and redirect locally
docker exec -e PGPASSWORD=$POSTGRES_PASSWORD <postgres-container> \
pg_dump -U $POSTGRES_USER -d $POSTGRES_DB -Fc -Z 9 > prefect.dump
- Kubernetes:
kubectl exec <postgres-pod> -- bash -lc \
'pg_dump -U $POSTGRES_USER -d $POSTGRES_DB -Fc -Z 9' > prefect.dump
Managed services
- Prefer native snapshots (RDS/Cloud SQL/Aurora) for full-cluster backups; use pg_dump for logical exports.
Docs, if you enjoy light bedtime reading:
- pg_dump
- pg_dumpall
- pg_restore
- pg_basebackup
If you tell me:
- how you’re running Postgres (Docker, K8s, managed),
- DB name, and
- whether you need PITR or just nightly snapshots,
I’ll give you a copy‑pasteable backup + restore script for your setup.