<@ULVA73B9P> can I use automations to retry a flow...
# ask-marvin
j
@Marvin can I use automations to retry a flow that Crashed?
m
thought for 106.7 seconds
Short answer: yes. Use an Automation that triggers when a flow run enters the Crashed state and have it start a new run of the same deployment. What this does (and doesn’t): - It won’t resurrect the crashed run. It will submit a new run of the deployment (effectively a “retry”). - Built-in
retries
on your flow/task only handle Failures raised by your code; they can’t help when the process crashes and the engine isn’t around to schedule a retry. How to set it up (UI): - Go to Automations → New automation - Trigger: “Flow run state entered” - Condition/filter: State type = Crashed (optionally filter by specific deployment or tags) - Action: “Run deployment” (select the same deployment; optionally pass the same parameters) - Add a cooldown window/rate limit so you don’t loop endlessly if crashes persist Docs for reference: - Automations overview: https://docs.prefect.io/latest/automations/ - States: https://docs.prefect.io/latest/concepts/states/ Notes for Vertex AI work pools (since that’s your setup): each Automation-triggered run will submit a fresh Vertex job. If you’re seeing repeated Crashes, add a cooldown and fix the underlying cause (image, package, resource limits) before letting it auto-loop. If you’d prefer to do this programmatically or need help wiring parameter pass-through, tell me where you’re running (Prefect Cloud vs OSS) and I’ll sketch the exact steps.
j
@Marvin how do I use the same parameters that the Crashed flow was using?
m
thought for 83.9 seconds
Yes — have the Automation “Run deployment” action reuse the triggering run’s parameters via templating. In the Automation: - Trigger: Flow run state entered → Crashed - Action: Run deployment → select the same deployment - Parameters (JSON): set it to render the crashed run’s parameters:
Copy code
{{ event.payload.flow_run.parameters | tojson }}
Notes: - The action field supports Jinja templating with access to the event. In most orgs the parameters live at
event.payload.flow_run.parameters
. If your UI’s “Preview event data” shows a slightly different path, use that one. - You can also template the run name if you like:
Copy code
{{ event.payload.flow_run.name }} (retry)
- Add a cooldown/rate-limit so you don’t spin forever if it keeps crashing. Refs: - Automations: docs - States/events: states and events If you want to merge/override a parameter (e.g., set
{"retry": true}
), say the word and I’ll show a safe pattern.
j
@Marvin How do I do this?
Add a cooldown/rate-limit so you don’t spin forever if it keeps crashing.
m
thought for 39.1 seconds
Two simple guardrails: 1) Automation cooldown - Edit your Automation → scroll to Action policy → set Cooldown (e.g., 10m). - This prevents the automation from firing again for any matching crash within that window, so you don’t end up machine‑gunning retries. 2) Tag-based loop breaker (recommended) - In the Action → Run deployment, add a run tag like
auto-retry
. - In the Trigger filters, add “Flow run tags does not contain `auto-retry`”. - Result: only original runs can trigger the automation; the retried runs won’t retrigger if they crash too. Optional belt-and-suspenders: add a concurrency limit tag to the deployment (value 1) so only one retry runs at a time. See https://docs.prefect.io/latest/concepts/concurrency/.
j
In the Automation:
- Trigger: Flow run state entered → Crashed
- Action: Run deployment → select the same deployment
- Parameters (JSON): set it to render the crashed run’s parameters:
Copy code
{{ event.payload.flow_run.parameters | tojson }}
@Marvin this is not working, or at least I'm seeing a warning sign in parameters
m
thought for 44.2 seconds
That warning usually means the editor can’t statically validate your template as JSON. Two fixes: 1) Make sure you’re outputting a JSON object, not a string - Don’t wrap the template in quotes. - Use a default and tojson so the rendered result is valid JSON even if parameters are missing:
Copy code
{{ event.payload.flow_run.parameters | default({}, true) | tojson }}
If your event preview shows a different path, swap it in (e.g.,
event.flow_run.parameters
). 2) Verify the path with “Preview event data” - In the Automation editor, open “Preview event data” on your trigger. - Find where
parameters
live in that JSON. - Use that exact path in the template. Common pitfalls to check: - Quoted template: remove surrounding quotes. The field should be the expression itself, not a JSON string containing it. - Wrong path: some environments expose
event.flow_run.parameters
instead of
event.payload.flow_run.parameters
. - Missing params: if the source run had no parameters, the default filter above ensures
{}
so the JSON stays valid. If the UI still shows a warning but lets you save, it’s just the linter being grumpy because it can’t evaluate Jinja at edit time. You can “Test automation” or trigger a dummy crash to confirm the action renders correct JSON. Paste what you see in “Preview event data” around
parameters
if you want me to give you the exact template for your account.
j
@Marvin
Copy code
{
  "type": "event",
  "match": {
    "prefect.resource.id": "prefect.flow-run.*"
  },
  "match_related": {
    "prefect.resource.role": "flow",
    "prefect.resource.id": [
      "prefect.flow.ea038696-35c3-405b-a0b6-0c05ae284b81"
    ]
  },
  "after": [],
  "expect": [
    "prefect.flow-run.Crashed"
  ],
  "for_each": [
    "prefect.resource.id"
  ],
  "posture": "Reactive",
  "threshold": 1,
  "within": 0
}
m
thought for 32.8 seconds
What you pasted is the trigger rule, not the event payload. We need the payload to know the exact path to the parameters. Do this to confirm the path: - In the Automation editor, open your trigger → click “Preview event data”. - Expand until you see something like: - payload → flow_run → parameters: {...} - Use that exact path in the Parameters box, unquoted, with a safe default:
Copy code
{{ event.payload.flow_run.parameters | default({}, true) | tojson }}
If your preview shows a different shape, try these common variants: - If payload is top-level in the template context:
Copy code
{{ payload.flow_run.parameters | default({}, true) | tojson }}
- If flow_run is exposed top-level:
Copy code
{{ flow_run.parameters | default({}, true) | tojson }}
Tips: - Don’t wrap the template in quotes; the field must render to a JSON object, not a JSON string. - The little warning just means the UI can’t statically evaluate Jinja. Use “Test automation” to validate the rendered JSON. - If the original run had no parameters, the
default({}, true)
keeps it valid. If you can paste the “Preview event data” section that contains
parameters
, I’ll give you the exact one-liner for your account.
j
@Marvin there's no "Preview event data". I'm using Prefect 3.4.11
m
thought for 170.9 seconds
Right, 3.4.11 doesn’t have “Preview event data”. Two ways forward: Option A: Try the two common templates One of these usually matches the event context in 3.4.x. Put it in the Parameters field (unquoted) and save. - Most common:
Copy code
{{ event.payload.flow_run.parameters | default({}, true) | tojson }}
- Alternate:
Copy code
{{ event.flow_run.parameters | default({}, true) | tojson }}
If one renders, you’re done. The warning icon is just the editor grumbling because it can’t evaluate Jinja at edit time. Option B: Confirm the exact path via CLI (deterministic) - In a terminal, stream one event and make any flow run complete/fail/crash (the path to parameters is the same across states):
Copy code
prefect events stream --run-once --format json
- You’ll get a JSON blob. Look for something like:
Copy code
{
  "event": "prefect.flow-run.Completed",
  "payload": {
    "flow_run": {
      "parameters": { ... }
    }
  }
}
- Use that exact path in your Automation:
Copy code
{{ event.payload.flow_run.parameters | default({}, true) | tojson }}
If your event JSON shows flow_run at the top level of the event object instead of under payload, use:
Copy code
{{ event.flow_run.parameters | default({}, true) | tojson }}
Notes: - Do not wrap the template in quotes; it must render to a JSON object, not a string. - The default({}, true) keeps it valid if the original run had no parameters. If you paste a single event JSON from that CLI output (trimmed around flow_run), I’ll give you the exact one‑liner for your workspace. Links for posterity: Automations, Events.
j
Trying to find this 2) Tag-based loop breaker (recommended) - In the Action → Run deployment, add a run tag like
auto-retry
. - In the Trigger filters, add “Flow run tags does not contain `auto-retry`”. @Marvin but I can't see the add a run tag part
m
thought for 137.5 seconds
You’re not missing it — in 3.4.11 the “Run deployment” action doesn’t expose Run tags in the UI. Hence, no place to add
auto-retry
there. Two solid workarounds: A) Use the run name as the loop breaker - In the action’s Run name field:
Copy code
{{ event.payload.flow_run.name }} [auto-retry]
- In the trigger filters: “Flow run name does not contain auto-retry” - Keep your Parameters templating as-is to reuse the original params. B) Use cooldown/rate-limit only - Action policy → set a Cooldown (e.g., 10m) and optionally a maximum run count per window. This won’t prevent infinite retries forever, but it stops rapid loops. If you really want a parameter-based loop breaker, you can inject a marker param (and then filter on it if your Automation filters include parameter checks):
Copy code
{% set p = event.payload.flow_run.parameters or {} %}
{% set _ = p.update({'_auto_retry': true}) %}
{{ p | tojson }}
Then add a trigger filter like “Flow run parameters does not contain _auto_retry = true” (only if your UI exposes parameter filters). If upgrading is an option, newer UI builds expose more action fields; otherwise the name-based approach is the least painful.