< Marvin> we are using self hosted Prefect version 3 4 7 wit Prefect Community #ask-marvin

<@ULVA73B9P> we are using self-hosted Prefect (ver...

Zach Gelfond

09/26/2025, 2:46 PM

@Marvin we are using self-hosted Prefect (version 3.4.7) with a Postgres db (version 14.18) behind it. Some tables (such as deployment, automation_event_follower, work_queue, automation_bucket) have under 100 live rows but between 100,000 and 1,000,000 dead tuples. Running

VACUUM ANALYZE [table name]

is not significantly dropping the dead tuple counts. Before we run

pg_repack

or if that doesn't resolve it

VACUUM FULL [table name]

, we want to attempt to fix the issue by modifying Prefect configs. Do you have any suggestions? Additionally, can you help us understand why so many dead tuples are forming and why they don't seem to decrease.

Marvin

09/26/2025, 2:46 PM

✅ thought for 666.6 seconds

Marvin

09/26/2025, 2:57 PM

Thanks for the detailed context — this pattern (very few live rows but huge dead tuples) is classic for “hot” control tables that are updated constantly. In Prefect 3.x, the tables you named are written to very frequently: - work_queue: updated on every worker poll and by server services that mark queues ready/not-ready and emit status events - deployment: updated by the scheduler/foreman loops (e.g., marking deployments ready/not-ready, tracking “recent deployments”) - automation_event_follower and automation_bucket: updated as events are processed and automations evaluate/advance their positions Why so many dead tuples form - PostgreSQL creates a new row version on every UPDATE. Small tables that get updated every few seconds will accumulate dead tuples rapidly even if the number of “live” rows stays ~constant. - If autovacuum can’t clean up (e.g., because of long‑running transactions or replication settings holding back the xmin horizon), those dead tuples linger and the table/indexes bloat. - VACUUM (and VACUUM ANALYZE) does not shrink table files; it only marks space reusable. If something is preventing cleanup, VACUUM will report it (see “cannot remove… old snapshot” messages with VERBOSE). Prefect-side knobs to reduce write churn (3.4.7) If you can tolerate a bit more latency in status visibility/scheduling, these reduce how often Prefect updates those tables: 1) Workers (reduces writes to work_queue and related status) - Increase the worker polling interval and heartbeat - PREFECT_WORKER_QUERY_SECONDS (default ~10): increase to 30–60 - PREFECT_WORKER_HEARTBEAT_SECONDS (default ~30): increase to 60–120 - Tradeoff: slower pickup of new work and slower unhealthy detection - Docs: Worker healthchecks 2) Server services loops (reduces periodic updates to work_queue/deployment) - Foreman loop (marks work pools/queues/deployments ready/unready and emits status) - PREFECT_SERVER_SERVICES_FOREMAN_LOOP_SECONDS (default 15): increase to 30–60 - Scheduler loop (creates scheduled runs, maintains “recent deployments”) - PREFECT_SERVER_SERVICES_SCHEDULER_LOOP_SECONDS (default 60): increase to 120+ - PREFECT_SERVER_SERVICES_SCHEDULER_RECENT_DEPLOYMENTS_LOOP_SECONDS (default 5): increase to 15–30 - These settings are shown in Prefect’s settings models: - Foreman: loop_seconds, work_queue_last_polled_timeout_seconds - Scheduler: loop_seconds, recent_deployments_loop_seconds - Settings reference: Settings reference 3) Events / Automations (targets automation_* tables) - If you do not rely on automations or event persistence, consider disabling to stop nearly all writes to automation_* and event tables: - PREFECT_SERVER_SERVICES_TRIGGERS_ENABLED=false (stops the triggers/automations service) - PREFECT_SERVER_SERVICES_EVENT_PERSISTER_ENABLED=false (stops persisting events) - PREFECT_SERVER_EVENTS_STREAM_OUT_ENABLED=false (stops streaming events out) - Tradeoff: you’ll lose automations and/or event history/streaming - If you need automations, reduce the cadence and batch more: - PREFECT_SERVER_EVENTS_PROACTIVE_GRANULARITY (default 5s): increase to 15–30s to reduce bucketing churn - PREFECT_SERVER_SERVICES_EVENT_PERSISTER_FLUSH_INTERVAL (default 5s): increase to 15–30s to write in larger/less-frequent batches - Relevant docs: - Automations models: Automations API - Events settings: Settings reference Example docker-compose overrides ``` # Worker(s) PREFECT_WORKER_QUERY_SECONDS=60 PREFECT_WORKER_HEARTBEAT_SECONDS=120 # Server services PREFECT_SERVER_SERVICES_FOREMAN_LOOP_SECONDS=60 PREFECT_SERVER_SERVICES_SCHEDULER_LOOP_SECONDS=120 PREFECT_SERVER_SERVICES_SCHEDULER_RECENT_DEPLOYMENTS_LOOP_SECONDS=30 # Events / automations (choose based on needs)

Marvin

09/26/2025, 2:57 PM

# Disable entirely: # PREFECT_SERVER_SERVICES_TRIGGERS_ENABLED=false # PREFECT_SERVER_SERVICES_EVENT_PERSISTER_ENABLED=false # PREFECT_SERVER_EVENTS_STREAM_OUT_ENABLED=false # Or just reduce cadence: PREFECT_SERVER_EVENTS_PROACTIVE_GRANULARITY=30 PREFECT_SERVER_SERVICES_EVENT_PERSISTER_FLUSH_INTERVAL=30

Copy code

Why VACUUM ANALYZE didn’t “drop the counts”
- If you’re reading n_dead_tup from pg_stat_user_tables, it’s approximate and only refreshed by ANALYZE/autoanalyze. After VACUUM ANALYZE you should see the estimate update — unless VACUUM could not clean or the stats collector hasn’t updated yet.
- Most importantly, VACUUM cannot remove dead tuples if:
  - There are long-running transactions (including idle in transaction) holding old snapshots
  - A replica with hot_standby_feedback=on has long-running queries
  - Logical replication slots or walsenders are holding xmin back
- Check with these queries:

-- Dead tuple / vacuum stats for the tables in question SELECT relname, n_live_tup, n_dead_tup, vacuum_count, autovacuum_count, last_vacuum, last_autovacuum, analyze_count, last_analyze, last_autoanalyze FROM pg_stat_user_tables WHERE relname IN ('deployment','automation_event_follower','work_queue','automation_bucket'); -- Any long-running transactions SELECT pid, usename, state, xact_start, now()-xact_start AS xact_age, query FROM pg_stat_activity WHERE datname = current_database() AND xact_start IS NOT NULL ORDER BY xact_start; -- Autovacuum activity/progress SELECT * FROM pg_stat_progress_vacuum; -- Replication/slots that can pin xmin SELECT * FROM pg_stat_replication; SELECT slot_name, slot_type, active, restart_lsn, confirmed_flush_lsn FROM pg_replication_slots; -- See what VACUUM says explicitly VACUUM (VERBOSE, ANALYZE) public.work_queue;

Copy code

PostgreSQL-side mitigations for these specific “hot but tiny” tables
Even after reducing Prefect write frequency, tuning Postgres usually makes the biggest difference for this pattern:

- Per-table autovacuum thresholds (trigger vacuums sooner)

ALTER TABLE public.work_queue SET ( autovacuum_vacuum_scale_factor = 0.01, -- 1% of pages autovacuum_vacuum_threshold = 50, -- or even lower for tiny tables autovacuum_analyze_scale_factor= 0.02 ); ALTER TABLE public.deployment SET (autovacuum_vacuum_scale_factor = 0.01, autovacuum_vacuum_threshold = 50); ALTER TABLE public.automation_event_follower SET (autovacuum_vacuum_scale_factor = 0.01, autovacuum_vacuum_threshold = 50); ALTER TABLE public.automation_bucket SET (autovacuum_vacuum_scale_factor = 0.01, autovacuum_vacuum_threshold = 50);

Copy code

- Enable HOT updates by leaving room on pages (reduces index churn on frequent updates)

ALTER TABLE public.work_queue SET (fillfactor = 80); ALTER TABLE public.deployment SET (fillfactor = 80); -- REWRITE once to apply fillfactor to all pages VACUUM FULL public.work_queue; -- or pg_repack to avoid long locks VACUUM FULL public.deployment; ``` - Temporarily increase autovacuum aggressiveness globally while you clean up (optional): - autovacuum_vacuum_cost_limit higher - autovacuum_naptime lower - log_autovacuum_min_duration=0 (to see what’s happening) - If replicas exist, confirm they aren’t pinning cleanup: - Avoid long-running queries on replicas - Consider hot_standby_feedback=off if acceptable, or kill stale replica queries so VACUUM can clean Will you still need pg_repack/VACUUM FULL? - Likely yes for the already-bloated relations. VACUUM cannot shrink file size; pg_repack or VACUUM FULL is the usual one-time fix. The Prefect and Postgres changes above are to prevent it from happening again (or at least make it much slower to re-accumulate). Helpful references - Settings reference (all env vars map to these settings): Prefect settings reference - Worker health/polling: Worker healthchecks

Marvin

09/26/2025, 2:57 PM

- Automations models: Automations API - General DB guidance: Database maintenance A few clarifying questions to tailor the advice - Roughly how many workers do you run per work pool, and what are their current polling intervals? - Do you actively use automations? If so, about how many, and do they fire frequently? - Do you have read replicas/logical replication? If yes, can you share outputs of the queries above to check for xmin pinning or long-running transactions? If you share those details (and a snippet of pg_stat_user_tables for these relations), I can suggest concrete values for the Prefect settings and Postgres per-table autovacuum parameters.

Zach Gelfond

09/26/2025, 3:46 PM

@Marvin We have 2 workers per work pool polling between every 1 and 5 seconds (higher priority work pools poll more frequently). We have one automation that is actively used to delete long running flows but are considering removing it and using exclusively flow-level timeouts instead. We do not have read replicas or logical replication. Here is a snippet of pg_stat_user_tables saved as JSONs:

Copy code

[
 {
  "relname": "deployment",
  "n_live_tup": "23",
  "n_dead_tup": "729162",
  "vacuum_count": "1",
  "autovacuum_count": "3606",
  "last_vacuum": "2025-09-26 13:43:03.068695+00",
  "last_autovacuum": "2025-09-26 15:44:54.261092+00",
  "analyze_count": "1",
  "last_analyze": "2025-09-26 13:43:03.147411+00",
  "last_autoanalyze": "2025-09-26 15:44:55.315895+00"
 },
 {
  "relname": "work_queue",
  "n_live_tup": "7",
  "n_dead_tup": "159001",
  "vacuum_count": "2",
  "autovacuum_count": "3973",
  "last_vacuum": "2025-09-26 15:30:04.158969+00",
  "last_autovacuum": "2025-09-26 15:44:43.740111+00",
  "analyze_count": "2",
  "last_analyze": "2025-09-26 15:30:04.194517+00",
  "last_autoanalyze": "2025-09-26 15:44:43.897996+00"
 },
 {
  "relname": "automation_bucket",
  "n_live_tup": "49",
  "n_dead_tup": "621554",
  "vacuum_count": "1",
  "autovacuum_count": "3476",
  "last_vacuum": "2025-09-26 13:21:05.602454+00",
  "last_autovacuum": "2025-09-26 15:44:15.25604+00",
  "analyze_count": "1",
  "last_analyze": "2025-09-26 13:21:05.635403+00",
  "last_autoanalyze": "2025-09-26 15:21:15.756233+00"
 },
 {
  "relname": "automation_event_follower",
  "n_live_tup": "34",
  "n_dead_tup": "376346",
  "vacuum_count": "0",
  "autovacuum_count": "3929",
  "last_vacuum": null,
  "last_autovacuum": "2025-09-26 15:44:20.581614+00",
  "analyze_count": "0",
  "last_analyze": null,
  "last_autoanalyze": "2025-09-26 15:31:45.765308+00"
 }
]

Marvin

09/26/2025, 3:52 PM

✅ thought for 65.4 seconds

Marvin

09/26/2025, 3:53 PM

Super helpful details — thanks. Based on your setup and the stats you shared, here’s a concise plan to cut the churn that’s creating the dead tuples and to clean up what’s already accumulated. What’s happening - Two workers per pool polling every 1–5s means very frequent updates to

work_queue

and related status rows. - Server loops (foreman/scheduler) also touch

deployment

frequently. - Your one automation drives writes to

automation_bucket

and

automation_event_follower

. - Autovacuum is running constantly (3.4k–4k runs) but cannot keep up with the continuous update rate; dead tuples accrue faster than they are reclaimed. VACUUM won’t shrink files even when it reclaims space. Prefect-side changes (reduce write frequency) If you can tolerate slightly slower “recency” in the UI and pickup times, these settings will materially reduce bloat: 1) Workers (biggest win for

work_queue

) - Increase polling and heartbeat:

Copy code

PREFECT_WORKER_QUERY_SECONDS=15      # was 1–5; try 10–15 for high-priority, 30 for others
PREFECT_WORKER_HEARTBEAT_SECONDS=60  # was ~30

- Consider reducing to 1 worker for low-traffic pools. - Tradeoff: slightly slower pickup and slower unhealthy detection. - Docs: Worker healthchecks 2) Server services (reduces background writes to

work_queue

and

deployment

)

Copy code

PREFECT_SERVER_SERVICES_FOREMAN_LOOP_SECONDS=60
PREFECT_SERVER_SERVICES_SCHEDULER_LOOP_SECONDS=120
PREFECT_SERVER_SERVICES_SCHEDULER_RECENT_DEPLOYMENTS_LOOP_SECONDS=30

- Settings reference: Settings reference 3) Events / Automations (targets

automation_*

) - Since you’re considering removing the automation, you can turn off triggers/event persistence entirely to stop most writes to these tables:

Copy code

PREFECT_SERVER_SERVICES_TRIGGERS_ENABLED=false
PREFECT_SERVER_SERVICES_EVENT_PERSISTER_ENABLED=false
PREFECT_SERVER_EVENTS_STREAM_OUT_ENABLED=false

- If you must keep automations, increase batching/intervals to reduce churn:

Copy code

PREFECT_SERVER_EVENTS_PROACTIVE_GRANULARITY=30   # default 5s; fewer bucket updates
PREFECT_SERVER_SERVICES_EVENT_PERSISTER_FLUSH_INTERVAL=30  # default 5s; larger, fewer writes

- Automations API overview: Automations API - Settings reference: Settings reference PostgreSQL-side tuning (let autovacuum keep up) These tables are tiny but “hot”; tune autovacuum aggressively for them and enable HOT updates: 1) Per-table autovacuum thresholds

Copy code

ALTER TABLE public.work_queue SET (
  autovacuum_vacuum_scale_factor = 0.0,
  autovacuum_vacuum_threshold    = 50,
  autovacuum_analyze_scale_factor= 0.0,
  autovacuum_analyze_threshold   = 50,
  autovacuum_vacuum_cost_limit   = 2000
);

ALTER TABLE public.deployment SET (
  autovacuum_vacuum_scale_factor = 0.0,
  autovacuum_vacuum_threshold    = 50,
  autovacuum_analyze_scale_factor= 0.0,
  autovacuum_analyze_threshold   = 50,
  autovacuum_vacuum_cost_limit   = 2000
);

ALTER TABLE public.automation_event_follower SET (
  autovacuum_vacuum_scale_factor = 0.0,
  autovacuum_vacuum_threshold    = 50,
  autovacuum_analyze_scale_factor= 0.0,
  autovacuum_analyze_threshold   = 50,
  autovacuum_vacuum_cost_limit   = 2000
);

ALTER TABLE public.automation_bucket SET (
  autovacuum_vacuum_scale_factor = 0.0,
  autovacuum_vacuum_threshold    = 50,
  autovacuum_analyze_scale_factor= 0.0,
  autovacuum_analyze_threshold   = 50,
  autovacuum_vacuum_cost_limit   = 2000
);

- Rationale: for tiny tables, a small absolute threshold is better than the default scale factor. Higher cost limit lets VACUUM finish quickly. 2) Enable HOT updates via fillfactor and rewrite once ``` ALTER TABLE public.work_queue SET (fillfactor = 80); ALTER TABLE public.deployment SET (fillfactor = 80);

Marvin

09/26/2025, 3:53 PM

ALTER TABLE public.automation_event_follower SET (fillfactor = 80); ALTER TABLE public.automation_bucket SET (fillfactor = 80);

Copy code

- Then do a one-time rewrite to apply page free space:
  - Prefer pg_repack (minimal locking); otherwise `VACUUM FULL` during a maintenance window.
  - Example:

-- Using pg_repack (recommended) SELECT repack.repack_table('public.work_queue'); SELECT repack.repack_table('public.deployment'); SELECT repack.repack_table('public.automation_event_follower'); SELECT repack.repack_table('public.automation_bucket');

Copy code

3) Optional: reindex the hottest indexes after the rewrite

REINDEX INDEX CONCURRENTLY <index_name>;

Copy code

(You can identify the largest/bloatiest indexes with pg_stat_user_indexes or pgstattuple.)

Why your VACUUMs didn’t materially reduce the counts
- Your `autovacuum_count` is very high and `last_autovacuum` is recent for all tables — they are running, but dead tuples are re‑created faster than VACUUM reclaims them.
- VACUUM does not shrink file sizes; only marks space reusable. After you slow the churn and do a one-time pg_repack/VACUUM FULL, the size and dead-tuple counters will normalize.

Quick runbook
1) Apply the Prefect settings above (workers, foreman/scheduler, and disable triggers/events if you remove the automation).
2) Set per-table autovacuum params and fillfactor.
3) Run pg_repack once for the four tables to reclaim space.
4) Monitor:

Copy code

5) Sanity-check blockers while monitoring (just in case):

SELECT pid, usename, state, xact_start, now()-xact_start AS xact_age, query FROM pg_stat_activity WHERE datname = current_database() AND xact_start IS NOT NULL ORDER BY xact_start; SELECT * FROM pg_stat_progress_vacuum; ``` On removing the automation - If you switch to flow-level timeouts, disabling the triggers service and event persister will dramatically reduce writes to

automation_*

and event tables. That’s the cleanest path given your goals. If you want, share: - The number of work pools and whether some are cold; we can suggest pool-specific polling values. -

pg_stat_user_indexes

for these relations; we can point out which indexes to reindex after repack. References - Settings reference: Prefect settings reference - Worker health/polling: Worker healthchecks - Automations API overview: Automations API - General DB guidance: Database maintenance

Zach Gelfond

09/26/2025, 4:03 PM

@Marvin Running

Copy code

SELECT pid, usename, state, xact_start, now()-xact_start AS xact_age, query
FROM pg_stat_activity
WHERE datname = current_database() AND xact_start IS NOT NULL
ORDER BY xact_start;

shows 6

active

and 4

idle in transaction

queries over a day old. 1 is an autovacuum on

event_resources

(which is currently our biggest table), 2 are inserts on

automation_bucket

, and the rest are deletes from

event_resources

from when we tried to delete older records in that table to free up space. Regarding your suggestions, since its important for us to have flow runs start quickly after being created, we likely won't increase polling unless its necessary. The other suggestions are good though.

Marvin

09/26/2025, 4:03 PM

✅ thought for 84.4 seconds

Marvin

09/26/2025, 4:05 PM

Thanks — those “day-old” transactions are the smoking gun. Long-running transactions (especially idle in transaction) hold the global xmin horizon back. When xmin is pinned: - VACUUM cannot reclaim dead tuples on any affected table, even if it runs frequently (which your autovacuum stats show) - Dead tuples pile up on hot, tiny tables like work_queue, deployment, automation_* Immediate remediation 1) Kill the long transactions and prevent them going forward - One-time cleanup:

Copy code

-- terminate "idle in transaction" sessions older than 10 minutes
SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE datname = current_database()
  AND state = 'idle in transaction'
  AND now() - xact_start > interval '10 minutes';

- Enforce timeouts so it doesn’t recur:

Copy code

-- Global (requires superuser)
ALTER SYSTEM SET idle_in_transaction_session_timeout = '5min';
SELECT pg_reload_conf();

-- Optionally protect the app by limiting long statements too
-- (set per role/database so you don't disrupt admin operations)
ALTER ROLE prefect SET statement_timeout = '2min';

- For your maintenance scripts/tools: ensure they COMMIT promptly and avoid leaving sessions open after large operations. 2) Fix your event_resources cleanup process Large, monolithic DELETEs cause long transactions and huge bloat. Switch to batched deletes:

Copy code

-- repeat this until 0 rows; keep each transaction short
WITH c AS (
  SELECT id
  FROM event_resources
  WHERE created < now() - interval '7 days'
  ORDER BY id
  LIMIT 10000
)
DELETE FROM event_resources e
USING c
WHERE e.id = c.id;

VACUUM (ANALYZE) event_resources;   -- run periodically between batches

- Ensure there’s an index supporting your retention predicate (e.g., on created). - Even better long-term: time-based partitioning so you can DROP old partitions instead of DELETE. Prefect has a nice walkthrough here: Partitioning without downtime. 3) Consider your events/automations choice Since you’re open to replacing the automation with flow-level timeouts: - Disabling these will immediately slash writes to automation_* and events tables:

Copy code

PREFECT_SERVER_SERVICES_TRIGGERS_ENABLED=false
PREFECT_SERVER_SERVICES_EVENT_PERSISTER_ENABLED=false
PREFECT_SERVER_EVENTS_STREAM_OUT_ENABLED=false

- If you keep automations, reduce cadence/batch:

Copy code

PREFECT_SERVER_EVENTS_PROACTIVE_GRANULARITY=30
PREFECT_SERVER_SERVICES_EVENT_PERSISTER_FLUSH_INTERVAL=30

Refs: Settings reference, Automations API 4) Keep fast starts while cutting some churn You can keep 1–5s polling on your highest-priority pool and: - Use 1 worker (not 2) for low-traffic pools, or - Keep those at 10–15s polling This preserves fast starts where you need them while reducing background updates to work_queue overall. Postgres tuning for your “hot but tiny” tables These help autovacuum keep up and reduce index churn: - Per-table autovacuum thresholds (tiny absolute thresholds work best for small hot tables): ``` ALTER TABLE public.work_queue SET ( autovacuum_vacuum_scale_factor = 0.0, autovacuum_vacuum_threshold = 50, autovacuum_analyze_scale_factor= 0.0, autovacuum_analyze_threshold = 50, autovacuum_vacuum_cost_limit = 2000 ); ALTER TABLE public.deployment SET ( autovacuum_vacuum_scale_factor = 0.0, autovacuum_vacuum_threshold = 50, autovacuum_analyze_scale_factor= 0.0, autovacuum_analyze_threshold = 50, autovacuum_vacuum_cost_limit = 2000 ); ALTER TABLE public.automation_event_follower SET ( autovacuum_vacuum_scale_factor = 0.0, autovacuum_vacuum_threshold = 50, autovacuum_analyze_scale_factor= 0.0, autovacuum_analyze_threshold = 50, autovacuum_vacuum_cost_limit = 2000 ); ALTER TABLE public.automation_bucket SET ( autovacuum_vacuum_scale_factor = 0.0,

Marvin

09/26/2025, 4:05 PM

autovacuum_vacuum_threshold = 50, autovacuum_analyze_scale_factor= 0.0, autovacuum_analyze_threshold = 50, autovacuum_vacuum_cost_limit = 2000 );

Copy code

- Enable HOT updates by reserving page space; then do a one-time rewrite:

ALTER TABLE public.work_queue SET (fillfactor = 80); ALTER TABLE public.deployment SET (fillfactor = 80); ALTER TABLE public.automation_event_follower SET (fillfactor = 80); ALTER TABLE public.automation_bucket SET (fillfactor = 80); -- Repack preferred to avoid long locks SELECT repack.repack_table('public.work_queue'); SELECT repack.repack_table('public.deployment'); SELECT repack.repack_table('public.automation_event_follower'); SELECT repack.repack_table('public.automation_bucket');

Copy code

- For your huge event_resources work: consider temporarily bumping
  - autovacuum_max_workers (e.g., to 5)
  - autovacuum_work_mem (larger)
  - autovacuum_vacuum_cost_limit (larger)
so autovacuum can finish passes faster.

About the autovacuum running >1 day on event_resources
- That’s expected on very large tables, but it can also be slowed or blocked by those long transactions. Once you terminate the long “idle in transaction” sessions and switch to batched deletes, autovacuum should make visible progress. Use:

SELECT * FROM pg_stat_progress_vacuum WHERE relid::regclass = 'public.event_resources'::regclass;

Copy code

to see the current phase and heap_blks_vacuumed/total.

Finally, after you:
- Kill long/idle transactions
- Reduce event/automation churn (or disable them)
- Tune autovacuum + fillfactor
Run a one-time pg_repack (or VACUUM FULL during a window) on the four hot tables to reclaim the existing bloat. After that, watch:

SELECT relname, n_live_tup, n_dead_tup, vacuum_count, autovacuum_count, last_vacuum, last_autovacuum FROM pg_stat_user_tables WHERE relname IN ('deployment','automation_event_follower','work_queue','automation_bucket'); ``` If you want, share the output of: - pg_stat_progress_vacuum for event_resources - pg_stat_user_indexes for those four tables and I’ll suggest specific index reindexing and confirm autovacuum is now able to clean up promptly. Useful refs: - Settings reference: Prefect settings reference - Worker health/polling: Worker healthchecks - Partitioning approach: Partitioning without downtime - General DB guidance: Database maintenance

Zach Gelfond

09/26/2025, 5:21 PM

@Marvin here is

pg_stat_progress_vacuum

for the pid associated with the autovacuum on

event_resources

that has been going for over a day

Copy code

[
 {
  "pid": 3757068,
  "datid": 16545,
  "datname": "prefect",
  "relid": 17526,
  "phase": "vacuuming indexes",
  "heap_blks_total": "66264099",
  "heap_blks_scanned": "16074895",
  "heap_blks_vacuumed": "15814322",
  "index_vacuum_count": "12",
  "max_dead_tuples": "11184809",
  "num_dead_tuples": "11184527"
 }
]

and here are the top 10 largest indexes of the tables

('event_resources', 'deployment','automation_event_follower','work_queue','automation_bucket')

pg_stat_user_indexes

Copy code

[
 {
  "schemaname": "public",
  "relname": "event_resources",
  "indexrelname": "ix_event_resources__resource_id__occurred",
  "indexrelid": 17536,
  "index_size": "299 GB",
  "idx_scan": "4434",
  "idx_tup_read": "170422",
  "idx_tup_fetch": "170422"
 },
 {
  "schemaname": "public",
  "relname": "event_resources",
  "indexrelname": "pk_event_resources",
  "indexrelid": 17534,
  "index_size": "116 GB",
  "idx_scan": "286968305",
  "idx_tup_read": "286968305",
  "idx_tup_fetch": "286903669"
 },
 {
  "schemaname": "public",
  "relname": "event_resources",
  "indexrelname": "ix_event_resources__updated",
  "indexrelid": 17537,
  "index_size": "86 GB",
  "idx_scan": "31536",
  "idx_tup_read": "1099565722",
  "idx_tup_fetch": "59020"
 },
 {
  "schemaname": "public",
  "relname": "automation_bucket",
  "indexrelname": "uq_automation_bucket__automation_id__trigger_id__bucketing_key",
  "indexrelid": 17525,
  "index_size": "94 MB",
  "idx_scan": "1771367",
  "idx_tup_read": "2742369",
  "idx_tup_fetch": "432367"
 },
 {
  "schemaname": "public",
  "relname": "deployment",
  "indexrelname": "trgm_ix_deployment_name",
  "indexrelid": 17072,
  "index_size": "78 MB",
  "idx_scan": "0",
  "idx_tup_read": "0",
  "idx_tup_fetch": "0"
 },
 {
  "schemaname": "public",
  "relname": "automation_event_follower",
  "indexrelname": "uq_follower_for_scope",
  "indexrelid": 17588,
  "index_size": "37 MB",
  "idx_scan": "18641332",
  "idx_tup_read": "462154",
  "idx_tup_fetch": "0"
 },
 {
  "schemaname": "public",
  "relname": "automation_event_follower",
  "indexrelname": "ix_ae_follower_scope_leader",
  "indexrelid": 161333954,
  "index_size": "31 MB",
  "idx_scan": "0",
  "idx_tup_read": "0",
  "idx_tup_fetch": "0"
 },
 {
  "schemaname": "public",
  "relname": "automation_bucket",
  "indexrelname": "ix_automation_bucket__automation_id__end",
  "indexrelid": 17470,
  "index_size": "25 MB",
  "idx_scan": "206772",
  "idx_tup_read": "61840214561",
  "idx_tup_fetch": "3530"
 },
 {
  "schemaname": "public",
  "relname": "automation_event_follower",
  "indexrelname": "ix_automation_event_follower__leader_event_id",
  "indexrelid": 17522,
  "index_size": "24 MB",
  "idx_scan": "33065342",
  "idx_tup_read": "417684",
  "idx_tup_fetch": "28"
 },
 {
  "schemaname": "public",
  "relname": "automation_bucket",
  "indexrelname": "ix_automation_bucket__updated",
  "indexrelid": 17471,
  "index_size": "18 MB",
  "idx_scan": "0",
  "idx_tup_read": "0",
  "idx_tup_fetch": "0"
 }
]

Marvin

09/26/2025, 5:21 PM

✅ thought for 95.9 seconds

Marvin

09/26/2025, 5:23 PM

This is great signal. A few key takeaways from your stats: - Autovacuum on event_resources is stuck in “vacuuming indexes” with num_dead_tuples ≈ max_dead_tuples and index_vacuum_count = 12. That’s a classic sign that autovacuum_work_mem/maintenance_work_mem are too small for the amount of dead tuples, forcing many repeated passes over extremely large indexes. - event_resources indexes are huge (e.g., 299 GB + 116 GB + 86 GB). Vacuuming those with small memory and continuous new writes is very expensive. - There are “idle in transaction” sessions older than a day — these will pin the xmin horizon and prevent cleanup elsewhere, compounding bloat. A focused, low-latency plan (keeps fast starts, avoids increasing poll interval) Immediate actions 1) Terminate long transactions and add guardrails

Copy code

-- kill long idle sessions (run a few times a day via cron)
SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE datname = current_database()
  AND state = 'idle in transaction'
  AND now() - xact_start > interval '10 minutes';

-- prevent recurrence
ALTER SYSTEM SET idle_in_transaction_session_timeout = '5min';
SELECT pg_reload_conf();

2) Reduce (or pause) writes to events/automations while you do maintenance - Since you’re open to removing the automation, disable these to stop most writes to automation_* and events tables:

Copy code

PREFECT_SERVER_SERVICES_TRIGGERS_ENABLED=false
PREFECT_SERVER_SERVICES_EVENT_PERSISTER_ENABLED=false
PREFECT_SERVER_EVENTS_STREAM_OUT_ENABLED=false

- If you can’t disable, plan a short maintenance window where you shut down the Prefect server API or just the event persister/triggers to let vacuum and cleanup catch up. 3) Raise memory for vacuum so it can finish index cleanup in fewer passes - In postgresql.conf:

Copy code

-- Raise for the maintenance window (and keep higher if you can)
maintenance_work_mem = '2GB'         -- or higher if you have RAM
autovacuum_work_mem = '2GB'          -- PG14 supports per-worker memory
vacuum_cost_limit = 2000             -- let vacuum do more work per cycle

- Reload and let the autovacuum continue; if you still see it spinning on “vacuuming indexes”, consider a manual run during low-traffic:

Copy code

VACUUM (VERBOSE, ANALYZE, PARALLEL 4) public.event_resources;

Note: Autovacuum does not use parallel vacuum; a manual VACUUM can. Disable autovac on this table during the manual run if needed:

Copy code

ALTER TABLE public.event_resources SET (autovacuum_enabled = false);
-- run manual VACUUM, then re-enable:
ALTER TABLE public.event_resources RESET (autovacuum_enabled);

Safer retention cleanup for event_resources - Avoid long DELETEs (they create long transactions and more bloat). Do small, indexed batches: 1) Ensure there’s an index that matches your retention predicate - If you delete by occurred, create a dedicated index:

Copy code

CREATE INDEX CONCURRENTLY IF NOT EXISTS ix_event_resources__occurred
  ON public.event_resources (occurred);

(Your largest index is on (resource_id, occurred); that’s not ideal for a simple time-based DELETE.) 2) Batch deletes in small chunks (commit every batch)

Copy code

-- repeat until 0 rows affected
WITH c AS (
  SELECT id
  FROM public.event_resources
  WHERE occurred < now() - interval '7 days'
  ORDER BY id
  LIMIT 10000
)
DELETE FROM public.event_resources e
USING c
WHERE e.id = c.id;

VACUUM (ANALYZE) public.event_resources;  -- occasionally between batches

- If you delete by updated instead, index updated similarly. Best long-term: partition event tables by time, then DROP old partitions. See Prefect’s partitioning write-up: Partitioning without downtime. Reclaiming bloat efficiently - After you’ve reduced churn and done retention deletes, reclaim space. For event_resources and the four hot control tables: - Prefer pg_repack (concurrent, minimal locks): ``` SELECT repack.repack_table('public.event_resources');

Marvin

09/26/2025, 5:23 PM

SELECT repack.repack_table('public.work_queue'); SELECT repack.repack_table('public.deployment'); SELECT repack.repack_table('public.automation_event_follower'); SELECT repack.repack_table('public.automation_bucket');

Copy code

- If pg_repack isn’t available, use VACUUM FULL in a maintenance window (exclusive lock).

- Reindex the largest/bloatiest indexes after cleanup (especially on event_resources):

REINDEX INDEX CONCURRENTLY public.ix_event_resources__resource_id__occurred; REINDEX INDEX CONCURRENTLY public.pk_event_resources; REINDEX INDEX CONCURRENTLY public.ix_event_resources__updated;

Copy code

Drop obviously-unused indexes to reduce vacuum work
- trgm_ix_deployment_name shows 0 scans. If you don’t need fuzzy search over deployment names, drop it:

DROP INDEX CONCURRENTLY IF EXISTS public.trgm_ix_deployment_name;

Copy code

(Spot-check with EXPLAIN on your typical queries first.)

Keep fast starts without raising poll intervals
- Keep 1–5s polling on your highest-priority pools.
- For low/medium priority pools, consider:
  - 1 worker instead of 2, or
  - 10–15s polling on those only.
This reduces total write pressure to work_queue without affecting critical pools.

Per-table autovacuum and HOT for the hot but tiny tables
These help prevent re-bloat on work_queue, deployment, automation_*:

ALTER TABLE public.work_queue SET ( autovacuum_vacuum_scale_factor = 0.0, autovacuum_vacuum_threshold = 50, autovacuum_analyze_scale_factor= 0.0, autovacuum_analyze_threshold = 50, autovacuum_vacuum_cost_limit = 2000, fillfactor = 80 ); ALTER TABLE public.deployment SET ( autovacuum_vacuum_scale_factor = 0.0, autovacuum_vacuum_threshold = 50, autovacuum_analyze_scale_factor= 0.0, autovacuum_analyze_threshold = 50, autovacuum_vacuum_cost_limit = 2000, fillfactor = 80 ); ALTER TABLE public.automation_event_follower SET ( autovacuum_vacuum_scale_factor = 0.0, autovacuum_vacuum_threshold = 50, autovacuum_analyze_scale_factor= 0.0, autovacuum_analyze_threshold = 50, autovacuum_vacuum_cost_limit = 2000, fillfactor = 80 ); ALTER TABLE public.automation_bucket SET ( autovacuum_vacuum_scale_factor = 0.0, autovacuum_vacuum_threshold = 50, autovacuum_analyze_scale_factor= 0.0, autovacuum_analyze_threshold = 50, autovacuum_vacuum_cost_limit = 2000, fillfactor = 80 ); ``` After setting fillfactor, do a one-time pg_repack for those tables to apply page free space. Why your autovacuum is cycling - The “vacuuming indexes” phase shows num_dead_tuples at the memory limit (max_dead_tuples). With gigantic indexes and constant writes, autovacuum must loop many times to clean them. Increasing autovacuum_work_mem and pausing event writes lets it complete in fewer passes. Helpful Prefect links for the settings we discussed - Settings reference: Prefect settings reference - Worker health/polling: Worker healthchecks - Automations API: Automations API - Database maintenance: Database maintenance - Partitioning blog: Partitioning without downtime If you can share: - Which column you use for event retention (occurred vs updated) - The output of EXPLAIN on your retention DELETE I can confirm the best supporting index and tune the batch size/timing.

Zach Gelfond

09/26/2025, 5:28 PM

@Marvin is the

event_resources

used for anything internally within Prefect other than just logging? Hypothetically, would this issue be resolved temporarily with a truncate?

Marvin

09/26/2025, 5:28 PM

✅ thought for 65.6 seconds

Marvin

09/26/2025, 5:29 PM

Short answer - event_resources is part of Prefect’s events subsystem, not core orchestration. It’s used for event persistence/queries (UI event timeline, counts) and for automations’ lookback/history. It is not the “logs” table and is not required for flow scheduling/execution. - Yes, a TRUNCATE would immediately clear it and reclaim space, but only if you can tolerate losing event history (and with automations disabled or paused). It will take an ACCESS EXCLUSIVE lock and you should pause event ingestion first. It fixes the symptom quickly but not the root cause. Details and safe procedure What uses event_resources - Events UI and API (read_events/count/read_next_page). See events storage functions: Settings reference and the events modules in server code. - Automations: triggers can use persisted events to evaluate conditions over a look-back window or catch up if they miss streaming. If automations are disabled, nothing internal depends on event history. If you can remove/disable automations and event persistence 1) Stop/disable event writers so no new events arrive during maintenance:

Copy code

PREFECT_SERVER_SERVICES_TRIGGERS_ENABLED=false
PREFECT_SERVER_SERVICES_EVENT_PERSISTER_ENABLED=false
PREFECT_SERVER_EVENTS_STREAM_OUT_ENABLED=false

2) Cancel the long-running autovacuum on event_resources if still active:

Copy code

SELECT pg_cancel_backend(pid) FROM pg_stat_activity
WHERE pid = <autovacuum_pid> AND query ILIKE '%event_resources%';

3) Truncate event tables. Start with event_resources, but in practice you’ll likely want to clear the entire events store to keep FKs happy and prevent skew:

Copy code

-- Adjust list to your schema; commonly:
TRUNCATE TABLE public.event_resources, public.events CASCADE;

4) Optionally reclaim index bloat right away (TRUNCATE empties indexes, but if you’d rather rebuild, do it after traffic resumes):

Copy code

REINDEX TABLE CONCURRENTLY public.event_resources;
REINDEX TABLE CONCURRENTLY public.events;

5) Re-enable services if you still want them writing again later. If you must keep automations - TRUNCATE will wipe their history/lookback; some triggers may miss or re-evaluate unexpectedly. If you must keep them, do not truncate. Instead: - Lower events retention (default is 7d) to a smaller window to keep tables small: - PREFECT_SERVER_EVENTS_RETENTION_PERIOD=2d (via ServerEventsSettings.retention_period) - Batch-delete old rows with short transactions, then VACUUM/pg_repack, as we outlined earlier. Notes/caveats - TRUNCATE acquires ACCESS EXCLUSIVE lock; do it during a brief maintenance window with events ingestion paused. It is transactional and fast for very large tables. - TRUNCATE clears table contents and resets visibility; it effectively eliminates table and index bloat for those relations in one shot. - This will remove historical events from the UI. It does not impact Prefect’s flow/task orchestration or scheduling. - Root cause remains high write rate + long-lived transactions. Keep the guardrails you implemented (idle_in_transaction_session_timeout, batched deletes, increased autovacuum_work_mem/maintenance_work_mem) to prevent reoccurrence. Given your plan to move away from the single automation to flow-level timeouts, the cleanest path is: - Disable triggers and event persister - TRUNCATE events tables (events + event_resources, plus any related event tables in your schema) - Keep those services off to prevent re-growth - Optionally re-enable only after you’ve also reduced retention and memory-tuned autovacuum If you want, paste your exact list of event-related tables from pg_class/pg_constraint and I’ll give you a precise TRUNCATE set that preserves referential integrity.

Zach Gelfond

09/26/2025, 7:05 PM

@Marvin Update: killing the long running transactions (except the event_resources autovacuum which we did not kill, not sure how safe that would be) seemed to have fixed the issue with the small tables with high dead_tuple bloat. We are in the processes of disabling automations, but are concerned about the effects of stopping event writes since we do utilize

on_completion

on_failure

, etc within our flow decorators.

Marvin

09/26/2025, 7:05 PM

✅ thought for 83.1 seconds

Marvin

09/26/2025, 7:06 PM

Great progress on killing the long-running transactions — that explains why the dead tuples on the small, hot tables dropped so quickly. Re: disabling automations and stopping event writes vs on_completion/on_failure - Your on_completion/on_failure/on_running/on_failure hooks on the flow decorator are runtime callbacks executed by the flow runner around state transitions. They do not depend on the events subsystem or the event tables. - Disabling the events persister and/or triggers will not stop those hooks from running or prevent your flows from starting/finishing normally. Core orchestration (flow/task runs, states, logs, scheduling) is separate from events. What you will lose if you stop event writes - Event history/timeline in the UI and any API queries that read past events. - Automations that rely on events (you’re planning to disable these anyway). - Live event streaming can be kept if you want ephemeral real-time updates without persistence (see below). Safe minimal change that preserves your hooks - Keep your flow decorator hooks as-is; they’re unaffected. - Disable event persistence (to stop writes to event_resources/events) and disable automations; optionally keep streaming enabled if you want ephemeral, in-session updates:

Copy code

PREFECT_SERVER_SERVICES_TRIGGERS_ENABLED=false
PREFECT_SERVER_SERVICES_EVENT_PERSISTER_ENABLED=false
# optional: keep real-time stream to subscribers/UI sessions without DB writes
PREFECT_SERVER_EVENTS_STREAM_OUT_ENABLED=true

If you prefer to retain some event history instead of turning it off - Keep the persister enabled but set a short retention to keep tables small (and switch your cleanup to batched deletes):

Copy code

# e.g., 2 days; use a short window you’re comfortable with
PREFECT_SERVER_EVENTS_RETENTION_PERIOD=2d   # or 172800s

- Continue enforcing idle_in_transaction_session_timeout and use batched deletes to avoid long transactions. Notes - Logs are separate from events; disabling event persistence does not disable log-to-API. - If you ever need to “hard reset” the event tables, a TRUNCATE will not impact your on_* hooks or scheduling, but do it in a brief maintenance window with the event persister disabled. References - Settings reference (env vars above map here): Prefect settings reference - Worker health/polling (unrelated to events but useful context): Worker healthchecks If you want, share how you use the on_* hooks (e.g., sending notifications, cleanup), and I can flag any edge cases. But in typical usage, turning off event persistence won’t affect them.

5 Views

Open in Slack

Previous Next