<@ULVA73B9P> We run self hosted prefect, version 3...
# ask-marvin
a
@Marvin We run self hosted prefect, version 3.4.6, with a single api server - the server has gotten stuck a couple of times since we deployed it with things like:
Copy code
insert or update on table "automation_bucket" violates foreign key constraint "fk_automation_bucket__automation_id__automation"
Am I correct that the recommended course of action is to upgrade? Based on Release notes: 3.4.14?
m
thought for 143.7 seconds
Short answer: yes—upgrading is the right move, and I’d recommend going beyond 3.4.14 to the latest 3.4.x patch. I can’t find a release note that explicitly mentions this specific FK violation on
automation_bucket
, but there have been a number of automations- and DB-related fixes across the 3.4.x line. Useful links: - 3.4 release notes (rolling): Docs: Prefect 3.4 release notes - GitHub releases (to pick the latest 3.4.x): GitHub: Prefect releases - The 3.4.14 tag you referenced: 3.4.14 release - Server/DB CLI docs: Server CLI docs Recommended path forward 1) Upgrade to the latest 3.4.x and run DB migrations - Back up your database. - Plan a short maintenance window. - Upgrade the Server image/version, then run:
Copy code
prefect server database upgrade -y
- Alternatively, set this for automatic migrations at startup:
Copy code
PREFECT_SERVER_DATABASE_MIGRATE_ON_START=true
- If you have a large DB, increase DB timeout:
Copy code
PREFECT_API_DATABASE_TIMEOUT=600
2) Quick integrity check in Postgres (to spot existing inconsistencies)
Copy code
SELECT b.id, b.automation_id
FROM automation_bucket b
LEFT JOIN automation a ON a.id = b.automation_id
WHERE a.id IS NULL;
If this returns rows, there are orphaned bucket entries; you may need to clean them up. If the FK violation occurs on insert, it can also indicate a race condition that later patches have addressed. 3) Mitigation while you plan the upgrade - Consider pausing/disable/re-enable automations to force a clean state:
Copy code
prefect automations ls
prefect automations pause <automation-id>
prefect automations resume <automation-id>
- If the server is “stuck” due to repeated failures, temporarily pausing problematic automations can stabilize the API while you upgrade. A couple of clarifying questions to help us be more specific: - Which Postgres version are you using? - Did these errors start after creating/updating/deleting many automations in a short time? - Are these errors transient or do they persist until a restart? - Any chance there were partial or failed migrations in this environment? If you share the server logs around the FK violations (and the output of the query above), I can help pinpoint whether this is a data integrity issue in your DB or something that should be fully resolved by upgrading.
a
@Marvin some answers: we are using postgres 15, cloudsql db-custom-2-8192 tier. No, these errors were not during lots of create/update/delete events. They do persist until a restart, and as a consequence, we are not triggering automation events.
m
thought for 79.3 seconds
Thanks for the details. Given you’re on 3.4.6 and the FK violation persists until a restart (blocking automations), upgrading to the latest 3.4.x is the right next step. I can’t find a release note that explicitly calls out a fix for
automation_bucket
FK violations in 3.4.14, but there have been various automations/DB hardening changes across 3.4.x. I’d recommend going to the newest 3.4.x patch rather than stopping at 3.4.14. Helpful links: - 3.4 release notes: Docs: Prefect 3.4 release notes - GitHub releases: GitHub: Prefect releases - Server/DB CLI docs: Server CLI docs Action plan 1) Immediate mitigation (avoid full restart) - Identify and terminate any stuck DB session that’s repeatedly failing on
automation_bucket
(this often clears the “persist until restart” behavior):
Copy code
-- Find likely offenders
SELECT pid, usename, state, query
FROM pg_stat_activity
WHERE query ILIKE '%automation_bucket%';

-- Carefully terminate the offending backend(s)
SELECT pg_terminate_backend(<pid>);
- Optionally pause automations while stabilizing:
Copy code
prefect automations ls
prefect automations pause <automation-id>
-- later
prefect automations resume <automation-id>
2) Check for data integrity issues - Look for orphaned
automation_bucket
rows referencing missing automations:
Copy code
SELECT DISTINCT b.automation_id
FROM automation_bucket b
LEFT JOIN automation a ON a.id = b.automation_id
WHERE a.id IS NULL;
- If any rows appear, back up first, then consider cleaning them:
Copy code
DELETE FROM automation_bucket b
WHERE NOT EXISTS (
  SELECT 1 FROM automation a WHERE a.id = b.automation_id
);
Note: your error is on insert/update, but existing orphans can keep error paths hot; cleaning can help. 3) Upgrade and migrate - Back up DB and schedule a short maintenance window. - Upgrade Prefect to the latest 3.4.x, then run:
Copy code
prefect server database upgrade -y
- To auto-apply migrations on container start:
Copy code
PREFECT_SERVER_DATABASE_MIGRATE_ON_START=true
- For larger DBs, increase DB timeout:
Copy code
PREFECT_API_DATABASE_TIMEOUT=600
4) Observe post-upgrade - If errors persist, enable debug logging on the server to capture the automation id and query details:
Copy code
PREFECT_LOGGING_LEVEL=DEBUG
- Send a small log snippet around the FK violation and we can help pinpoint the offending automation and code path. If we can reproduce or isolate a pattern, we’ll get a bug filed. A couple of quick questions to help narrow root cause: - Have any automations been deleted recently, especially close to the time of the first FK errors? - Are you seeing the same automation id repeated in errors, or multiple ids? - Do the errors cluster around specific event types? If you share a snippet of the server logs around the violation (plus the output of the orphan-check query), I’ll help confirm whether this is pure data-integrity fallout vs. a transaction/exception-handling bug that should be resolved by the upgrade.
a
@Marvin We have set backgroundServices.runAsSeparateDeployment=true, if we upgrade to the latest 3.4.x, do we also need to run redis? Suggest an optimal configuration, if yes.
m
thought for 139.1 seconds
Great question. That setting looks like a Helm chart value that runs Prefect’s background services (scheduler, triggers/automations, late-run marker, etc.) as a separate Kubernetes Deployment. Upgrading to the latest 3.4.x does not introduce a Redis requirement for this mode. Summary: - Do you need Redis? No. Prefect Server and its background services coordinate via Postgres. Redis is optional and only needed if you explicitly choose Redis-backed features (e.g., using the prefect-redis integration for caching or as a concurrency provider). - When would Redis help? Only if you want Redis-specific functionality (cache, locks, or a Redis-backed concurrency provider) for scale/latency reasons; it’s not required for automations or the scheduler. Recommended configuration (self-hosted, K8s/Helm) - API server - 2+ replicas behind a stable Service/Ingress for HA. - Ensure DB migrations run during upgrades. - Background services - Keep running them as a separate deployment (your current setup). Start with 1 replica to avoid duplicate work; scale to 2 only if you’ve validated your workload behaves well with multiple service instances. - Database - Cloud SQL Postgres 15 is fine. Run migrations on upgrade and consider a slightly higher DB timeout for large datasets. - Env vars to consider
Copy code
# Apply DB migrations automatically on start
PREFECT_SERVER_DATABASE_MIGRATE_ON_START=true

# Helpful for large DBs during migrations
PREFECT_API_DATABASE_TIMEOUT=600

# Enable extra logging if you need to diagnose service behavior
PREFECT_LOGGING_LEVEL=DEBUG
Example Helm values sketch (adjust to your chart’s schema)
Copy code
server:
  api:
    replicas: 2
    env:
      - name: PREFECT_SERVER_DATABASE_MIGRATE_ON_START
        value: "true"
      - name: PREFECT_API_DATABASE_TIMEOUT
        value: "600"

  backgroundServices:
    enabled: true
    runAsSeparateDeployment: true
    replicas: 1
    env:
      - name: PREFECT_LOGGING_LEVEL
        value: "INFO"
Links: - Server/DB CLI docs: Server CLI docs - 3.4 release notes: Docs: Prefect 3.4 release notes - Redis integration (optional): Prefect Redis integration If you are using the official Helm chart, I’m happy to tailor a concrete values.yaml snippet—feel free to share your current values for
server.api
,
server.backgroundServices
, and any env you’re setting.
a
@Marvin Then why, in the helm chart, here https://github.com/PrefectHQ/prefect-helm/blob/main/charts/prefect-server/values.yaml#L602, do I see the following:
Copy code
# if backgroundServices.runAsSeparateDeployment=true, you must set this to true or provide your own redis instance
m
thought for 144.4 seconds
Great catch—and thanks for pointing to the exact line in the Helm chart. Short answer: that comment is correct. When you run background services as a separate Deployment, you need a shared message bus across Pods. By default Prefect uses an in-memory broker, which only works within a single process/pod. Once you split API and background services into separate Deployments, they can’t share the in-memory bus—so you must provide a network-accessible broker. The Helm chart’s supported broker is Redis, hence the note: enable the bundled Redis or point Prefect to your own Redis. Why this is the case - Single deployment/pod: in-memory broker is fine. - Separate deployments/pods: you need a distributed broker; the Helm chart uses Redis Streams via the prefect-redis integration. - Internals: the broker is configured via server “events” settings; defaults are memory: -
PREFECT_SERVER_EVENTS_MESSAGING_BROKER
(default:
prefect.server.utilities.messaging.memory
) -
PREFECT_SERVER_EVENTS_MESSAGING_CACHE
(default:
prefect.server.utilities.messaging.memory
) - Redis broker module:
prefect_redis.messaging
Links: - Helm values (line you referenced): prefect-helm values.yaml - Redis integration: Prefect Redis integration - Settings reference (events/messaging): Settings reference Recommended configurations Option A: Keep background services separate (your current setup) - Enable chart-managed Redis (easiest): - Set
redis.enabled: true
in the Helm values. The chart wires the API and background service to use Redis for the events broker. - Ensure persistence/auth for Redis in production. - Or use your own Redis (e.g., Memorystore): - Set
redis.enabled: false
- Configure the server and background services to use the Redis broker: -
PREFECT_SERVER_EVENTS_MESSAGING_BROKER=prefect_redis.messaging
-
PREFECT_SERVER_EVENTS_MESSAGING_CACHE=prefect_redis.messaging
- Provide the Redis connection details as environment variables supported by the chart (or via a secret) according to the prefect-redis integration. - Important: Make sure the Prefect server image has the redis extra installed so it can import the broker: - Add
prefect[redis]
to your image or use the Helm values to add extra pip packages to both the API and background services. Option B: Collapse background services back into the API pod - Set
backgroundServices.runAsSeparateDeployment=false
- No Redis required; the default in-memory broker works when everything is in one pod. - This trades decoupling/HA of services for simpler ops. A practical values.yaml sketch (chart-managed Redis)
Copy code
server:
  api:
    replicas: 2
    extraPipPackages:
      - prefect[redis]
  backgroundServices:
    enabled: true
    runAsSeparateDeployment: true
    replicas: 1
    extraPipPackages:
      - prefect[redis]
  extraEnv:
    - name: PREFECT_SERVER_EVENTS_MESSAGING_BROKER
      value: prefect_redis.messaging
    - name: PREFECT_SERVER_EVENTS_MESSAGING_CACHE
      value: prefect_redis.messaging

redis:
  enabled: true
  architecture: standalone
  auth:
    enabled: true
    password: "<your-password>"
  master:
    persistence:
      enabled: true
      size: 8Gi
    resources:
      requests:
        cpu: 100m
        memory: 256Mi
      limits:
        cpu: 1
        memory: 1Gi
If you prefer external Redis (e.g., GCP Memorystore), I can tailor the env/values for your chart version. Let me know: - Helm chart version you’re using - Whether you want the chart-managed Redis or an external endpoint - If your server image already includes
prefect[redis]
or if you want to add it via Helm values I’ll provide an exact values.yaml for your setup.
a
@Marvin the helm chart does not seem to include extraPipPackages, does the prefecthq/prefect:3.4.22-python3.11 image include redis by default?
m
thought for 224.6 seconds
Short answer: No. The Helm chart does not expose an
extraPipPackages
option, and the
prefecthq/prefect:3.4.22-python3.11
image does not include Redis (client or server) by default. Details and options: - The Redis integration for Prefect is a separate package (
prefect-redis
) and relies on the
redis
Python client. Install them explicitly if you need them. - Recommended approach with Helm: build a small custom image that extends the Prefect base image, then point the chart to it. Example Dockerfile:
Copy code
FROM prefecthq/prefect:3.4.22-python3.11
RUN pip install --no-cache-dir prefect-redis redis
Then push the image and set the image fields in your Helm values for the chart you’re using (server and/or worker). The exact keys are in the chart’s values; see the chart source: - prefect-helm repo - Prefect 3 Helm deployment guide If you’re talking about packages needed by your flow runs on Kubernetes (not the API server), you can also inject packages at runtime via job infrastructure using `EXTRA_PIP_PACKAGES`: - Kubernetes job infrastructure guide Clarifying question: do you need Redis for your flow code (client/integration), or are you expecting a Redis server to be present? The Prefect images don’t include a Redis server—run it as a separate service if required.