< Marvin> can I use automations to retry a flow that Crashed Prefect Community #ask-marvin

Join Slack

<@ULVA73B9P> can I use automations to retry a flow...

# ask-marvin

José Agretti

08/12/2025, 12:48 PM

@Marvin can I use automations to retry a flow that Crashed?

Marvin

08/12/2025, 12:48 PM

✅ thought for 106.7 seconds

Marvin

08/12/2025, 12:50 PM

Short answer: yes. Use an Automation that triggers when a flow run enters the Crashed state and have it start a new run of the same deployment. What this does (and doesn’t): - It won’t resurrect the crashed run. It will submit a new run of the deployment (effectively a “retry”). - Built-in

retries

on your flow/task only handle Failures raised by your code; they can’t help when the process crashes and the engine isn’t around to schedule a retry. How to set it up (UI): - Go to Automations → New automation - Trigger: “Flow run state entered” - Condition/filter: State type = Crashed (optionally filter by specific deployment or tags) - Action: “Run deployment” (select the same deployment; optionally pass the same parameters) - Add a cooldown window/rate limit so you don’t loop endlessly if crashes persist Docs for reference: - Automations overview: https://docs.prefect.io/latest/automations/ - States: https://docs.prefect.io/latest/concepts/states/ Notes for Vertex AI work pools (since that’s your setup): each Automation-triggered run will submit a fresh Vertex job. If you’re seeing repeated Crashes, add a cooldown and fix the underlying cause (image, package, resource limits) before letting it auto-loop. If you’d prefer to do this programmatically or need help wiring parameter pass-through, tell me where you’re running (Prefect Cloud vs OSS) and I’ll sketch the exact steps.

José Agretti

08/12/2025, 12:55 PM

@Marvin how do I use the same parameters that the Crashed flow was using?

Marvin

08/12/2025, 12:55 PM

✅ thought for 83.9 seconds

Marvin

08/12/2025, 12:56 PM

Yes — have the Automation “Run deployment” action reuse the triggering run’s parameters via templating. In the Automation: - Trigger: Flow run state entered → Crashed - Action: Run deployment → select the same deployment - Parameters (JSON): set it to render the crashed run’s parameters:

Copy code

{{ event.payload.flow_run.parameters | tojson }}

Notes: - The action field supports Jinja templating with access to the event. In most orgs the parameters live at

event.payload.flow_run.parameters

. If your UI’s “Preview event data” shows a slightly different path, use that one. - You can also template the run name if you like:

Copy code

{{ event.payload.flow_run.name }} (retry)

- Add a cooldown/rate-limit so you don’t spin forever if it keeps crashing. Refs: - Automations: docs - States/events: states and events If you want to merge/override a parameter (e.g., set

{"retry": true}

), say the word and I’ll show a safe pattern.

José Agretti

08/12/2025, 1:09 PM

@Marvin How do I do this?

Add a cooldown/rate-limit so you don’t spin forever if it keeps crashing.

Marvin

08/12/2025, 1:09 PM

✅ thought for 39.1 seconds

Marvin

08/12/2025, 1:10 PM

Two simple guardrails: 1) Automation cooldown - Edit your Automation → scroll to Action policy → set Cooldown (e.g., 10m). - This prevents the automation from firing again for any matching crash within that window, so you don’t end up machine‑gunning retries. 2) Tag-based loop breaker (recommended) - In the Action → Run deployment, add a run tag like

auto-retry

. - In the Trigger filters, add “Flow run tags does not contain `auto-retry`”. - Result: only original runs can trigger the automation; the retried runs won’t retrigger if they crash too. Optional belt-and-suspenders: add a concurrency limit tag to the deployment (value 1) so only one retry runs at a time. See https://docs.prefect.io/latest/concepts/concurrency/.

José Agretti

08/12/2025, 2:00 PM

In the Automation:

- Trigger: Flow run state entered → Crashed

- Action: Run deployment → select the same deployment

- Parameters (JSON): set it to render the crashed run’s parameters:

Copy code

{{ event.payload.flow_run.parameters | tojson }}

@Marvin this is not working, or at least I'm seeing a warning sign in parameters

Marvin

08/12/2025, 2:00 PM

✅ thought for 44.2 seconds

Marvin

08/12/2025, 2:00 PM

That warning usually means the editor can’t statically validate your template as JSON. Two fixes: 1) Make sure you’re outputting a JSON object, not a string - Don’t wrap the template in quotes. - Use a default and tojson so the rendered result is valid JSON even if parameters are missing:

Copy code

{{ event.payload.flow_run.parameters | default({}, true) | tojson }}

If your event preview shows a different path, swap it in (e.g.,

event.flow_run.parameters

). 2) Verify the path with “Preview event data” - In the Automation editor, open “Preview event data” on your trigger. - Find where

parameters

live in that JSON. - Use that exact path in the template. Common pitfalls to check: - Quoted template: remove surrounding quotes. The field should be the expression itself, not a JSON string containing it. - Wrong path: some environments expose

event.flow_run.parameters

instead of

event.payload.flow_run.parameters

. - Missing params: if the source run had no parameters, the default filter above ensures

{}

so the JSON stays valid. If the UI still shows a warning but lets you save, it’s just the linter being grumpy because it can’t evaluate Jinja at edit time. You can “Test automation” or trigger a dummy crash to confirm the action renders correct JSON. Paste what you see in “Preview event data” around

parameters

if you want me to give you the exact template for your account.

José Agretti

08/12/2025, 2:02 PM

@Marvin

Copy code

{
  "type": "event",
  "match": {
    "prefect.resource.id": "prefect.flow-run.*"
  },
  "match_related": {
    "prefect.resource.role": "flow",
    "prefect.resource.id": [
      "prefect.flow.ea038696-35c3-405b-a0b6-0c05ae284b81"
    ]
  },
  "after": [],
  "expect": [
    "prefect.flow-run.Crashed"
  ],
  "for_each": [
    "prefect.resource.id"
  ],
  "posture": "Reactive",
  "threshold": 1,
  "within": 0
}

Marvin

08/12/2025, 2:02 PM

✅ thought for 32.8 seconds

Marvin

08/12/2025, 2:03 PM

What you pasted is the trigger rule, not the event payload. We need the payload to know the exact path to the parameters. Do this to confirm the path: - In the Automation editor, open your trigger → click “Preview event data”. - Expand until you see something like: - payload → flow_run → parameters: {...} - Use that exact path in the Parameters box, unquoted, with a safe default:

Copy code

{{ event.payload.flow_run.parameters | default({}, true) | tojson }}

If your preview shows a different shape, try these common variants: - If payload is top-level in the template context:

Copy code

{{ payload.flow_run.parameters | default({}, true) | tojson }}

- If flow_run is exposed top-level:

Copy code

{{ flow_run.parameters | default({}, true) | tojson }}

Tips: - Don’t wrap the template in quotes; the field must render to a JSON object, not a JSON string. - The little warning just means the UI can’t statically evaluate Jinja. Use “Test automation” to validate the rendered JSON. - If the original run had no parameters, the

default({}, true)

keeps it valid. If you can paste the “Preview event data” section that contains

parameters

, I’ll give you the exact one-liner for your account.

José Agretti

08/12/2025, 2:04 PM

@Marvin there's no "Preview event data". I'm using Prefect 3.4.11

Marvin

08/12/2025, 2:04 PM

✅ thought for 170.9 seconds

Marvin

08/12/2025, 2:07 PM

Right, 3.4.11 doesn’t have “Preview event data”. Two ways forward: Option A: Try the two common templates One of these usually matches the event context in 3.4.x. Put it in the Parameters field (unquoted) and save. - Most common:

Copy code

{{ event.payload.flow_run.parameters | default({}, true) | tojson }}

- Alternate:

Copy code

{{ event.flow_run.parameters | default({}, true) | tojson }}

If one renders, you’re done. The warning icon is just the editor grumbling because it can’t evaluate Jinja at edit time. Option B: Confirm the exact path via CLI (deterministic) - In a terminal, stream one event and make any flow run complete/fail/crash (the path to parameters is the same across states):

Copy code

prefect events stream --run-once --format json

- You’ll get a JSON blob. Look for something like:

Copy code

{
  "event": "prefect.flow-run.Completed",
  "payload": {
    "flow_run": {
      "parameters": { ... }
    }
  }
}

- Use that exact path in your Automation:

Copy code

{{ event.payload.flow_run.parameters | default({}, true) | tojson }}

If your event JSON shows flow_run at the top level of the event object instead of under payload, use:

Copy code

{{ event.flow_run.parameters | default({}, true) | tojson }}

Notes: - Do not wrap the template in quotes; it must render to a JSON object, not a string. - The default({}, true) keeps it valid if the original run had no parameters. If you paste a single event JSON from that CLI output (trimmed around flow_run), I’ll give you the exact one‑liner for your workspace. Links for posterity: Automations, Events.

José Agretti

08/12/2025, 2:11 PM

Trying to find this 2) Tag-based loop breaker (recommended) - In the Action → Run deployment, add a run tag like

auto-retry

. - In the Trigger filters, add “Flow run tags does not contain `auto-retry`”. @Marvin but I can't see the add a run tag part

Marvin

08/12/2025, 2:11 PM

✅ thought for 137.5 seconds

Marvin

08/12/2025, 2:14 PM

You’re not missing it — in 3.4.11 the “Run deployment” action doesn’t expose Run tags in the UI. Hence, no place to add

auto-retry

there. Two solid workarounds: A) Use the run name as the loop breaker - In the action’s Run name field:

Copy code

{{ event.payload.flow_run.name }} [auto-retry]

- In the trigger filters: “Flow run name does not contain auto-retry” - Keep your Parameters templating as-is to reuse the original params. B) Use cooldown/rate-limit only - Action policy → set a Cooldown (e.g., 10m) and optionally a maximum run count per window. This won’t prevent infinite retries forever, but it stops rapid loops. If you really want a parameter-based loop breaker, you can inject a marker param (and then filter on it if your Automation filters include parameter checks):

Copy code

{% set p = event.payload.flow_run.parameters or {} %}
{% set _ = p.update({'_auto_retry': true}) %}
{{ p | tojson }}

Then add a trigger filter like “Flow run parameters does not contain _auto_retry = true” (only if your UI exposes parameter filters). If upgrading is an option, newer UI builds expose more action fields; otherwise the name-based approach is the least painful.

2 Views

Open in Slack

Previous Next