Hello <@ULVA73B9P>! I'm having some issues with f...
# marvin-ai
p
Hello @Marvin! I'm having some issues with false positives in my zombie flow detection automation. Here's the json:
Copy code
{
  "type": "event",
  "match": {
    "prefect.resource.id": [
      "prefect.flow-run.*"
    ]
  },
  "match_related": {},
  "after": [
    "prefect.flow-run.heartbeat"
  ],
  "expect": [
    "prefect.flow-run.Failed",
    "prefect.flow-run.Crashed",
    "prefect.flow-run.heartbeat",
    "prefect.flow-run.Cancelled",
    "prefect.flow-run.Completed"
  ],
  "for_each": [
    "prefect.resource.id"
  ],
  "posture": "Proactive",
  "threshold": 1,
  "within": 90
}
I have confirmed the
PREFECT_RUNNER_HEARTBEAT_FREQUENCY
is set to 30. It seems to happen when a flow starts, there is 1 heartbeat, then a success/failure before the next heartbeat. This still triggers the automation
m
thought for 365.2 seconds
Thanks for sharing the config and details — that helps. What you’re seeing is usually caused by anchoring the expectation window on heartbeats. With
after: ["prefect.flow-run.heartbeat"]
, the automation opens a new “expectation window” every time a heartbeat event is observed. In very fast runs, you can end up with a window that starts at an unlucky moment (e.g., a trailing/late heartbeat near completion), then no additional heartbeat or terminal event arrives after that heartbeat, so the window times out and fires even though the run actually completed successfully. Two ways to avoid these false positives: 1) Anchor on Running instead of heartbeat This pattern expects either a heartbeat or a terminal state shortly after the run enters Running. It’s robust for fast runs and avoids spurious windows started by late/lone heartbeats.
Copy code
{
  "type": "event",
  "match": {
    "prefect.resource.id": [
      "prefect.flow-run.*"
    ]
  },
  "match_related": {},
  "after": [
    "prefect.flow-run.Running"
  ],
  "expect": [
    "prefect.flow-run.heartbeat",
    "prefect.flow-run.Completed",
    "prefect.flow-run.Failed",
    "prefect.flow-run.Cancelled",
    "prefect.flow-run.Crashed"
  ],
  "for_each": [
    "prefect.resource.id"
  ],
  "posture": "Proactive",
  "threshold": 1,
  "within": 60
}
- With
PREFECT_RUNNER_HEARTBEAT_FREQUENCY=30
, set
within
to something like 45–60s. This catches true zombies (no heartbeat and no terminal state after Running) but won’t trip for quick completions. 2) If you really want to monitor mid-run heartbeats continuously Use the same “after Running” window above to ensure the first heartbeat/terminal shows up, then add a second expectation that’s “after heartbeat, expect another heartbeat OR a terminal” — but only if you’re confident there aren’t late heartbeats occurring after terminal states in your environment. If you see any false positives with that, prefer the single Running-anchored expectation above. A couple of checks to confirm the root cause: - In one of the false positives, open the Event feed/timeline for that flow run and confirm the order and timestamps of: - Running - heartbeat - Completed/Failed/Cancelled/Crashed - If the window was opened by a heartbeat that arrived after the terminal event (or the last terminal event had already occurred), you’ll see why it fired. Notes and references: - Prefect event automation semantics (after/expect/within/threshold/posture/for_each): Event triggers and expectations - Zombie flow patterns and guidance: Detect zombie flows If you can share one example run’s event timeline (the 3–4 events and their timestamps), I can validate the exact ordering and suggest a minimal tweak.
p
ok great! Going to test with #1 above - thanks!