Ryan Sharp
08/20/2025, 3:19 PMMarvin
08/20/2025, 3:19 PMNate
08/20/2025, 3:22 PMNate
08/20/2025, 3:23 PMtask-run.Completed
events coming through?Marvin
08/20/2025, 3:25 PMPREFECT_API_URL
must be reachable from worker pods and correct for your deployment. Typical shape is http(s)://<host>/api
(include the base path if you serve Prefect under one, e.g. http://<host>/prefect/api
).
- If you’re on a subpath, ensure the server/base path settings are applied consistently so Prefect derives the correct WebSocket URL from the API URL.
3) Make sure your ingress permits WebSockets and long-lived connections
- NGINX Ingress (example annotations):
- <http://nginx.ingress.kubernetes.io/proxy-read-timeout|nginx.ingress.kubernetes.io/proxy-read-timeout>: "3600"
- <http://nginx.ingress.kubernetes.io/proxy-send-timeout|nginx.ingress.kubernetes.io/proxy-send-timeout>: "3600"
- <http://nginx.ingress.kubernetes.io/proxy-buffering|nginx.ingress.kubernetes.io/proxy-buffering>: "off"
- ALB/Traefik: enable/allow WebSockets and increase idle timeouts similarly.
- After fixing, you should see “Websocket connected” and fewer/no “unconfirmed events” in worker logs.
4) If you use Redis for messaging
- There’s an open thread where tasks/subflows vanished from the UI with Redis messaging due to recorder not receiving events: Issue #18540. If that’s your setup, compare your config and logs to that issue.
5) Check server logs for the event services
- Look for the task run recorder service starting and processing events; errors here point to the root cause.
- Docs: Server (Helm) docs for configuring services and base paths.
If you can share:
- Prefect versions (server and workers)
- Your PREFECT_API_URL
and whether you’re serving under a subpath
- Which ingress controller you’re using (NGINX/ALB/Traefik) and any relevant annotations
- Whether you’ve enabled Redis messaging
…I can point you to the exact knobs to turn. Otherwise, yes: it ran; the UI is just waiting for the news to arrive. Patience is a virtue. Unfortunately.Ryan Sharp
08/20/2025, 3:38 PMRyan Sharp
08/20/2025, 3:41 PMNate
08/20/2025, 3:49 PMNate
08/20/2025, 3:49 PMI wasn't able to see any of the task-run.Completed events coming through, but the UI does update after about 15 minutes to show they completedhm, it sounds like they eventually showed up then?
Ryan Sharp
08/20/2025, 3:56 PMRyan Sharp
08/20/2025, 4:04 PMMarvin
08/20/2025, 4:04 PMMarvin
08/20/2025, 4:07 PMPREFECT_MESSAGING_BROKER=prefect_redis.messaging
- PREFECT_MESSAGING_CACHE=prefect_redis.messaging
- PREFECT_REDIS_MESSAGING_HOST
, PREFECT_REDIS_MESSAGING_PORT
, PREFECT_REDIS_MESSAGING_DB
, and auth/SSL vars as needed
- Verify the prefect-redis integration is installed in the images running API and background services. If it isn’t, Prefect can silently fall back to in-memory messaging on some pods, which breaks acks in HA.
- Redis mode support
- Prefect’s Redis messaging uses Redis Streams semantics. Redis “cluster” mode can be… uncooperative with streams unless keys are carefully hash-tagged to the same slot. If you’re on Redis Enterprise Cluster or OSS cluster, try:
- Pointing Prefect at a proxy/single endpoint that presents a non-sharded view, or
- Testing with a single-writer Redis (Bitnami primary or Enterprise with a proxy endpoint) to confirm cluster sharding is the cause.
- There’s an open report where Redis messaging caused tasks/subflows to go missing in the UI: Issue #18540. Your symptoms are very similar.
- TaskRunRecorder running and connected
- In HA, the TaskRunRecorder service must be running in your background services and reading from the same messaging broker as the API pods. If it’s not running or cannot connect to Redis, task events won’t be recorded promptly (hence Pending in UI).
- Check logs in the background services for:
- “TaskRunRecorder started”
- Connection errors to Redis
- Backlogs/consumer errors
- Prefect server services overview: Server (Helm) docs
- API WebSocket path and acks
- Even though you’re not using ingress for API, confirm worker logs show “Websocket connected” with stable connections to the API Service and not constant reconnects.
- Since you have multiple API pods, make sure all API pods are identically configured for messaging. If some API pods run with in-memory messaging while others use Redis, acks will be inconsistent.
Quick isolations to prove the cause
- Temporarily disable Redis messaging (let Prefect default to in-memory) across all pods in HA. If the issue disappears, Redis messaging config or Redis mode is the culprit.
- Keep Redis messaging on but point to a single non-cluster Redis (Bitnami primary only) and retest. If the problem vanishes, it’s Redis Cluster behavior with streams.
Also, stay current
- You tested 3.4.10 and 3.4.13. Keep 3.4.13+ for the base-path/internal client fixes and other services reliability improvements:
- Latest releases: Release notes
If you can share:
- The exact env vars on an API pod and on a background service pod (the PREFECT_MESSAGING_*
and PREFECT_REDIS_*
set)
- Whether your Redis endpoint is true Cluster mode vs a proxy/single endpoint
- Logs from the TaskRunRecorder pod during a run
…I can call out the exact misconfiguration. But right now, my money’s on mixed messaging backends across pods or Redis Cluster stream behavior. Delightful, I know.Ryan Sharp
08/20/2025, 4:23 PMMarvin
08/20/2025, 4:23 PMMarvin
08/20/2025, 4:27 PMPREFECT_SERVER_SERVICES_TASK_RUN_RECORDER_ENABLED=true
- PREFECT_SERVER_SERVICES_SCHEDULER_ENABLED=true
- PREFECT_SERVER_SERVICES_REPOSSESSOR_ENABLED=true
- After restart, you should see logs like “TaskRunRecorder started” in that pod. If not, it’s not being launched.
- Ensure the background-services pod has the same DB config as API pods
- It must write to the same Postgres as the API. If the DB URL isn’t set there, the recorder can’t persist runs.
- Confirm your Postgres env (e.g., PREFECT_API_DATABASE_CONNECTION_URL
) is present in the background-services pod.
- Redis messaging config consistency
- You’ve set broker/cache to Redis — good. Explicitly add PREFECT_REDIS_MESSAGING_DB
(e.g., 0
) so every pod uses the same DB.
- The log “No active consumer groups found for stream actions” suggests no consumer is running for that stream — i.e., your background services that should consume from Redis aren’t actually up (again pointing at services not starting).
- Make sure the background-services image includes the redis integration
- The log line from prefect.prefect_redis.messaging
implies it’s there, but double-check the image build. Official images need the prefect-redis
extra or package installed (e.g., pip install 'prefect-redis>=0.2'
or use prefect[redis]
) for all server images, not just workers.
- Sanity check WebSocket connectivity from workers
- In worker logs you should see “Websocket connected” and not a constant reconnect loop. Unconfirmed events usually mean the server isn’t acking — because the recorder/event services aren’t alive.
Useful references
- Settings reference (service enable flags, DB, messaging): Settings reference
- Helm/server overview: Server (Helm) docs
- Related Redis messaging issue (symptoms match): Issue #18540
If you share:
- The background-services Deployment spec (image, command/args, env)
- Whether PREFECT_API_DATABASE_CONNECTION_URL
is set in that pod
- A few minutes of its logs after restart
…I can pinpoint why the recorder isn’t starting. But right now, the fix is to explicitly enable TaskRunRecorder and ensure that pod can reach both Redis and your Postgres. Without the recorder running, the UI will continue to contemplate “Pending” for far too long, much like the rest of us.Ryan Sharp
08/20/2025, 4:54 PMMarvin
08/20/2025, 4:54 PMMarvin
08/20/2025, 4:55 PMPREFECT_MESSAGING_BROKER=prefect_redis.messaging
- PREFECT_MESSAGING_CACHE=prefect_redis.messaging
- PREFECT_REDIS_MESSAGING_HOST=redis.svc.cluster.local
- PREFECT_REDIS_MESSAGING_PORT=6379
- PREFECT_REDIS_MESSAGING_DB=0
- PREFECT_REDIS_MESSAGING_USERNAME=default
(if applicable)
- PREFECT_REDIS_MESSAGING_PASSWORD=...
- Make sure background-services is actually running the recorder:
- PREFECT_SERVER_SERVICES_TASK_RUN_RECORDER_ENABLED=true
- Also keep scheduler/repossessor enabled in HA:
- PREFECT_SERVER_SERVICES_SCHEDULER_ENABLED=true
- PREFECT_SERVER_SERVICES_REPOSSESSOR_ENABLED=true
- And that background-services has the same Postgres URL as API pods.
- What “healthy” looks like
- Background-services logs should show the recorder starting (not just Redis group housekeeping).
- Worker logs should show “Websocket connected” and minimal “unconfirmed events”.
- Redis should show consumer groups on expected streams:
- XINFO GROUPS events
- XINFO GROUPS actions
- XPENDING events
should not trend upward indefinitely.
- Kubernetes notes
- Service endpoints only; you don’t need ingress for API-to-Redis.
- If you use Sentinel, point Prefect at the sentinel-aware endpoint (managed by your chart) so failover is automatic but the hostname remains stable.
- Keep Redis in the same cluster/region/namespace to minimize latency.
If you want, I can share a minimal Bitnami Redis + Sentinel values sketch or check your background-services Deployment to confirm the recorder and messaging settings are wired correctly.
References:
- Settings reference: Settings reference
- Server/Helm overview: Server (Helm) docs
- Related Redis messaging issue: Issue #18540
I realize “disable Redis and it works” is tempting. For HA, though, a single endpoint, non-clustered Redis with persistence and the recorder service actually running is the boring, reliable option—my favorite kind.Nate
08/20/2025, 4:56 PMNate
08/20/2025, 4:57 PMRyan Sharp
08/20/2025, 5:04 PMRyan Sharp
08/20/2025, 5:57 PMMarvin
08/20/2025, 5:57 PMMarvin
08/20/2025, 5:59 PMprefect config view --show-defaults --show-sources
- Confirm these are present and identical in both:
- PREFECT_MESSAGING_BROKER=prefect_redis.messaging
- PREFECT_MESSAGING_CACHE=prefect_redis.messaging
- PREFECT_REDIS_MESSAGING_HOST=redis-master.prefect.svc.cluster.local
- PREFECT_REDIS_MESSAGING_PORT=6379
- PREFECT_REDIS_MESSAGING_DB=0
← you’re missing this, set it explicitly
- PREFECT_REDIS_MESSAGING_USERNAME
(set to default
if your Redis uses ACLs)
- PREFECT_REDIS_MESSAGING_PASSWORD=...
- If any API pod is missing these, it’ll publish to in-memory and never reach Redis.
- Confirm the recorder is running in background-services
- With DEBUG
logging, you should see something like “TaskRunRecorder started”.
- If you don’t, the service isn’t starting. Keep:
- PREFECT_SERVER_SERVICES_TASK_RUN_RECORDER_ENABLED=true
- PREFECT_SERVER_SERVICES_SCHEDULER_ENABLED=true
- PREFECT_SERVER_SERVICES_REPOSSESSOR_ENABLED=true
- Check Redis for streams and groups
- From a shell with redis-cli access:
XINFO STREAM events
XINFO GROUPS events
XINFO STREAM actions
XINFO GROUPS actions
- You should see consumer groups for Prefect. If “events” has no groups while “actions” does, your recorder isn’t attached to events, or the distributor isn’t pushing.
- Look for publish/consumer errors in API and background logs
- API pods: search for errors from prefect.server.events.messaging
when creating publishers or publishing (if they can’t publish, they won’t ack workers → “unconfirmed events”).
- Background-services: search for TaskRunRecorder
and any consumer errors.
- Ensure prefect-redis is installed in your server images
- Inside an API and the background pod:
python -c "import prefect.prefect_redis; print('ok')"
- If that import fails, the env vars will be ignored and you’ll silently fall back to in-memory on that pod. Install the extra in the image (e.g., pip install 'prefect-redis>=0.2' or use a Prefect image built with the redis extra).
Minimal env set I recommend on all server-side pods (API + background-services)
PREFECT_SERVER_SERVICES_TASK_RUN_RECORDER_ENABLED=true
PREFECT_SERVER_SERVICES_SCHEDULER_ENABLED=true
PREFECT_SERVER_SERVICES_REPOSSESSOR_ENABLED=true
PREFECT_MESSAGING_BROKER=prefect_redis.messaging
PREFECT_MESSAGING_CACHE=prefect_redis.messaging
PREFECT_REDIS_MESSAGING_HOST=redis-master.prefect.svc.cluster.local
PREFECT_REDIS_MESSAGING_PORT=6379
PREFECT_REDIS_MESSAGING_DB=0
# If Redis ACLs are enabled:
# PREFECT_REDIS_MESSAGING_USERNAME=default
PREFECT_REDIS_MESSAGING_PASSWORD=...
If, after this, you still don’t see “TaskRunRecorder started”, paste the first 100 lines of the background-services logs after a restart. If “events” has no consumer groups, paste XINFO GROUPS events
. We’ll zero in on whether the API isn’t publishing or the recorder isn’t consuming.
Yes, it’s a bit of “everything must match.” HA is thrilling like that.Marvin
08/20/2025, 6:22 PMMarvin
08/20/2025, 6:22 PMRyan Sharp
08/20/2025, 6:24 PMMarvin
08/20/2025, 6:24 PMMarvin
08/20/2025, 6:28 PMprefect config view --show-sources
Ensure PREFECT_API_URL
is your self-hosted Server URL. If this points elsewhere, you’re writing runs to a different API than the UI.
2) Do runs actually exist at that API?
prefect flow-run ls --limit 10
- If you see runs here but not in the UI, the UI is pointed at the wrong API.
- If this is empty, you haven’t created runs at this API yet.
3) Are events flowing?
prefect events stream --format text
You should see events for flow/task state changes when you trigger a run.
Server side
4) Migrations
prefect server database upgrade
If migrations weren’t applied, the UI will be quite… minimal.
5) Services
prefect server services ls
You should see event-persister, task-run-recorder, etc. If you’re running in Kubernetes, also check pod logs:
kubectl logs deploy/prefect-server -c event-persister
kubectl logs deploy/prefect-server -c task-run-recorder
Look for DB errors or backoffs. Pay special attention to PREFECT_API_DATABASE_CONNECTION_URL
.
UI/API alignment
6) Verify what API the UI is calling
- In your browser dev tools, Network tab, filter for /api/
– do those requests succeed, and what host are they hitting?
- In your UI container/Helm values, verify the API URL env (commonly PREFECT_API_URL
or a VITE_*
API URL) matches your server API endpoint. A mismatch here = empty UI.
Sanity test
7) Minimal flow against your Server
- On a machine that can reach your Server, set PREFECT_API_URL
to your Server’s API.
- Run:
from prefect import flow
@flow
def hello():
return "hi"
if __name__ == "__main__":
hello()
You should see a flow run in prefect flow-run ls
and in the UI within a few seconds.
A few notes on your Redis entries
- XREADGROUP/XAUTOCLAIM from those consumer groups are normal. They indicate the services are polling/claiming messages; they do not confirm persistence into Postgres or that flow/task runs exist.
Docs and references
- Server services and event persister API reference: Event persister
- Managing settings (e.g., PREFECT_API_URL): Manage settings
- Settings reference: Settings reference
- Workspaces (if you’re on Cloud): Cloud workspaces
If you share:
- How you’re running Prefect (Server vs Cloud, Helm chart/version)
- The PREFECT_API_URL
from both your worker/flow environment and the UI
- The output of prefect flow-run ls --limit 10
- Any errors from the event-persister/task-run-recorder logs
…I can point you to the exact loose screw. I know, thrilling.Ryan Sharp
08/20/2025, 7:29 PMMarvin
08/20/2025, 7:29 PMNate
08/20/2025, 7:29 PMRyan Sharp
08/20/2025, 7:31 PMMarvin
08/20/2025, 7:32 PMPREFECT_API_DATABASE_CONNECTION_URL
- Migrations not applied in the environment the services write to
- Double-running services in different places with conflicting configs
Targeted checks
1) Compare env between API and services pods
- Print and diff:
- Redis URL (host, port, db index, password, scheme)
- PREFECT_API_DATABASE_CONNECTION_URL
- Any settings you use to pick the events broker (ensure both sides pick Redis in the Redis test)
- In Kubernetes:
kubectl exec -it <api-pod> -- env | sort
kubectl exec -it <services-pod> -- env | sort
2) Verify both sides see the same Redis stream
- From the API pod and the services pod:
redis-cli -u "$REDIS_URL" XINFO STREAM events
redis-cli -u "$REDIS_URL" XINFO GROUPS events
You should see the same stream metadata and groups, including event-persister
and task-run-recorder
. If one side shows a different DB index (empty stream) or can’t auth, that’s the smoking gun.
3) Check background service logs for DB errors
kubectl logs <services-pod> -c event-persister
kubectl logs <services-pod> -c task-run-recorder
Look for psycopg connection failures, permission errors, or “relation … does not exist” (migrations).
4) Confirm migrations
prefect server database upgrade
Run this against the same DB the services use. A missing alembic migration will make the persister/recorder unhappy and the UI very quiet.
5) Ensure only one broker is configured for a given test
- If the API is configured for Redis but the services default to in-memory (or vice versa), they will never meet.
6) With multiple API replicas
- It’s fine for services to run as a single separate process; just don’t run them both in the API and as a separate deployment with conflicting settings.
CLI and docs that help
- View current settings: prefect config view --show-sources
- Server services: prefect server services ls
/ prefect server services start
- DB migrations: prefect server database upgrade
- Event persister API ref: Event persister
- Settings reference: Settings reference
If you can share:
- The exact Redis URL used by API vs services (including db index and scheme)
- The Postgres URL env in the services pod
- Any errors from event-persister/task-run-recorder logs
…I can point to the precise misconfiguration. I know, the suspense is unbearable.Ryan Sharp
08/20/2025, 7:56 PMMarvin
08/20/2025, 7:56 PMMarvin
08/20/2025, 7:59 PMPREFECT_SERVER_SERVICES_EVENT_PERSISTER_ENABLED=true
- Then confirm services are actually running:
prefect server services ls
You should see Event Persister, Task Run Recorder, Scheduler, etc. “running.”
Sanity checks (just in case)
- Verify both pods use the same image/tag (so they share the same messaging plugin and settings).
- Confirm DB connectivity from the services pod. If you’ve never applied migrations in this env:
prefect server database upgrade
- Ensure both API and services agree on Redis settings:
- PREFECT_MESSAGING_BROKER=prefect_redis.messaging
- PREFECT_REDIS_MESSAGING_HOST/PORT/PASSWORD/DB
- If you want to be thorough, from the services pod:
redis-cli -h redis-master.prefect.svc.cluster.local -a redispassword -n 0 XINFO GROUPS events
You should see the consumer groups created by the event persister and task-run recorder.
References
- Settings reference (service enable flags): Settings reference
- Event Persister service API ref: Event persister
Flip that one flag and your “inconsistent” matrix should become consistent, in the satisfying way that makes everyone else yawn.Ryan Sharp
08/20/2025, 8:04 PMMarvin
08/20/2025, 8:04 PMcheck_cli_command
📊 Tools used:
• check_cli_command
(3x)
• display_callable_signature
(1x)
• explore_module_offerings
(1x)
• research_prefect_topic
(3x)
• search_github_discussions
(1x)Ryan Sharp
08/20/2025, 8:16 PMMarvin
08/20/2025, 8:16 PMNate
08/20/2025, 8:16 PMNate
08/20/2025, 8:17 PMNate
08/20/2025, 8:18 PMMarvin
08/20/2025, 8:20 PMPREFECT_SERVER_SERVICES_EVENT_PERSISTER_ENABLED=false
to disable.
- Example (Docker Compose):
services:
prefect-server:
environment:
PREFECT_SERVER_SERVICES_EVENT_PERSISTER_ENABLED: "false"
- Example (Kubernetes):
env:
- name: PREFECT_SERVER_SERVICES_EVENT_PERSISTER_ENABLED
value: "false"
- Tuning knobs (if you need to adjust throughput/latency):
- PREFECT_SERVER_SERVICES_EVENT_PERSISTER_BATCH_SIZE
- PREFECT_SERVER_SERVICES_EVENT_PERSISTER_FLUSH_INTERVAL
- These control batching and flush cadence for event writes.
Where events are stored:
- In your Prefect Server database (PostgreSQL for production). The server’s event storage layer writes/reads via the events database helpers (e.g., write_events
, read_events
).
How to verify it’s running:
- Logs: Set PREFECT_LOGGING_LEVEL=DEBUG
on the API/background services; you should see the Event Persister start and periodic batches being processed.
- API probe: Hit the events endpoint to confirm you’re receiving data
curl -s "$PREFECT_API_URL/events?limit=1"
If your PREFECT_API_URL
already ends with /api
, just append /events
.
Relevant docs:
- Event Persister (API reference): Event Persister service (API ref)
- Settings reference (env vars): Settings reference
- Events API endpoints: Events API (server)
If you’re looking to disable it for a specific service or tune batch/flush settings, tell me how you’re deploying Prefect (Docker Compose, Helm, bare processes) and I can share exact snippets.Ryan Sharp
08/20/2025, 10:37 PMRyan Sharp
08/21/2025, 6:40 PMMarvin
08/21/2025, 6:40 PMexplore_module_offerings
📊 Tools used:
• display_callable_signature
(1x)
• explore_module_offerings
(3x)
• research_prefect_topic
(1x)