Hey <@ULVA73B9P> the webhook setup to be called o...
# ask-marvin
h
Hey @Marvin the webhook setup to be called on state being set to crashed is not being called, can you help?
m
thought for 475.0 seconds
I can help! A few quick checks will usually surface why a “crashed” webhook didn’t fire. Before we dig in, can you share: - Are you on Prefect 3.x Cloud or self-hosted Server? - Is your automation set to trigger on a Flow run or Task run entering the Crashed state? - Any filters applied (tags, deployments, work pools)? - The automation name and a recent flow run ID that you expected to trigger it In the meantime, here’s a focused checklist to debug: 1) Confirm the run actually “Crashed” (not “Failed”) - “Failed” = your code raised an exception cleanly - “Crashed” = the process died unexpectedly (SIGKILL/OOM/container crash) - In the UI, open the run detail and confirm the terminal state is Crashed (not Failed). See states: States in Prefect 3 2) Verify the automation is enabled and the trigger matches - In the UI (Automations): ensure the automation is Enabled and the trigger is “Flow run enters state” (or “Task run enters state”) equals “Crashed”. If you used filters (e.g., tags, deployment), confirm the run actually matches them. - From the CLI, inspect the exact config:
Copy code
prefect automation ls
prefect automation inspect "YOUR AUTOMATION NAME" --yaml
Docs: Automation CLI 3) Confirm the event occurred - Go to the Events page and filter by the time window and the flow run. You should see a “flow run entered Crashed” style event for that run. If the event never occurred, the automation won’t fire. - Event concepts: Event triggers 4) Check the action (Call Webhook) configuration - Make sure the URL, method, headers, and body template are correct. - Ensure the webhook endpoint returns a 2xx response; non-2xx will be recorded as an action failure. - Set a sensible timeout and retries if needed. - If you can, test the endpoint with curl or your API client to confirm it’s reachable from your network. - How to create/configure a webhook: Create a webhook 5) Look for automation action results - In the UI, open the automation and check recent runs/results. If the action failed, you’ll see error details there (status code, timeout, etc.). 6) Reproduce a true “crash” to test - Raising an exception will mark the run as Failed, not Crashed. To simulate a crash, you can kill the process:
Copy code
from prefect import flow
import os, signal

@flow
def crash():
    os.kill(os.getpid(), signal.SIGKILL)

if __name__ == "__main__":
    crash()
- Run this the same way you run your normal flows (ideally as a deployment) so the automation can see it. Common gotchas - Triggering on the wrong resource (Task run vs Flow run) - Filters too restrictive (tags/deployment mismatch) - The run ended as Failed not Crashed - Automation is paused/disabled - Webhook endpoint blocked by firewall/VPC or requiring auth headers that weren’t set If you share: - The automation config (you can paste the
prefect automation inspect --yaml
output) - A flow run ID that ended in Crashed and the approximate time - The webhook URL domain (redact secrets) and whether it requires auth …I can pinpoint the issue and suggest an exact fix.
h
My worker seems to be using the image prefecthq/prefect:3.4.17-python3.12 . because of version mismatch could there be an issue here? because all the above things seem to be fine.