José Agretti
08/12/2025, 12:48 PMMarvin
08/12/2025, 12:48 PMMarvin
08/12/2025, 12:50 PMretries
on your flow/task only handle Failures raised by your code; they can’t help when the process crashes and the engine isn’t around to schedule a retry.
How to set it up (UI):
- Go to Automations → New automation
- Trigger: “Flow run state entered”
- Condition/filter: State type = Crashed (optionally filter by specific deployment or tags)
- Action: “Run deployment” (select the same deployment; optionally pass the same parameters)
- Add a cooldown window/rate limit so you don’t loop endlessly if crashes persist
Docs for reference:
- Automations overview: https://docs.prefect.io/latest/automations/
- States: https://docs.prefect.io/latest/concepts/states/
Notes for Vertex AI work pools (since that’s your setup): each Automation-triggered run will submit a fresh Vertex job. If you’re seeing repeated Crashes, add a cooldown and fix the underlying cause (image, package, resource limits) before letting it auto-loop.
If you’d prefer to do this programmatically or need help wiring parameter pass-through, tell me where you’re running (Prefect Cloud vs OSS) and I’ll sketch the exact steps.José Agretti
08/12/2025, 12:55 PMMarvin
08/12/2025, 12:55 PMMarvin
08/12/2025, 12:56 PM{{ event.payload.flow_run.parameters | tojson }}
Notes:
- The action field supports Jinja templating with access to the event. In most orgs the parameters live at event.payload.flow_run.parameters
. If your UI’s “Preview event data” shows a slightly different path, use that one.
- You can also template the run name if you like:
{{ event.payload.flow_run.name }} (retry)
- Add a cooldown/rate-limit so you don’t spin forever if it keeps crashing.
Refs:
- Automations: docs
- States/events: states and events
If you want to merge/override a parameter (e.g., set {"retry": true}
), say the word and I’ll show a safe pattern.José Agretti
08/12/2025, 1:09 PMAdd a cooldown/rate-limit so you don’t spin forever if it keeps crashing.
Marvin
08/12/2025, 1:09 PMMarvin
08/12/2025, 1:10 PMauto-retry
.
- In the Trigger filters, add “Flow run tags does not contain `auto-retry`”.
- Result: only original runs can trigger the automation; the retried runs won’t retrigger if they crash too.
Optional belt-and-suspenders: add a concurrency limit tag to the deployment (value 1) so only one retry runs at a time. See https://docs.prefect.io/latest/concepts/concurrency/.José Agretti
08/12/2025, 2:00 PMIn the Automation:
- Trigger: Flow run state entered → Crashed
- Action: Run deployment → select the same deployment
- Parameters (JSON): set it to render the crashed run’s parameters:
@Marvin this is not working, or at least I'm seeing a warning sign in parametersCopy code{{ event.payload.flow_run.parameters | tojson }}
Marvin
08/12/2025, 2:00 PMMarvin
08/12/2025, 2:00 PM{{ event.payload.flow_run.parameters | default({}, true) | tojson }}
If your event preview shows a different path, swap it in (e.g., event.flow_run.parameters
).
2) Verify the path with “Preview event data”
- In the Automation editor, open “Preview event data” on your trigger.
- Find where parameters
live in that JSON.
- Use that exact path in the template.
Common pitfalls to check:
- Quoted template: remove surrounding quotes. The field should be the expression itself, not a JSON string containing it.
- Wrong path: some environments expose event.flow_run.parameters
instead of event.payload.flow_run.parameters
.
- Missing params: if the source run had no parameters, the default filter above ensures {}
so the JSON stays valid.
If the UI still shows a warning but lets you save, it’s just the linter being grumpy because it can’t evaluate Jinja at edit time. You can “Test automation” or trigger a dummy crash to confirm the action renders correct JSON.
Paste what you see in “Preview event data” around parameters
if you want me to give you the exact template for your account.José Agretti
08/12/2025, 2:02 PM{
"type": "event",
"match": {
"prefect.resource.id": "prefect.flow-run.*"
},
"match_related": {
"prefect.resource.role": "flow",
"prefect.resource.id": [
"prefect.flow.ea038696-35c3-405b-a0b6-0c05ae284b81"
]
},
"after": [],
"expect": [
"prefect.flow-run.Crashed"
],
"for_each": [
"prefect.resource.id"
],
"posture": "Reactive",
"threshold": 1,
"within": 0
}
Marvin
08/12/2025, 2:02 PMMarvin
08/12/2025, 2:03 PM{{ event.payload.flow_run.parameters | default({}, true) | tojson }}
If your preview shows a different shape, try these common variants:
- If payload is top-level in the template context:
{{ payload.flow_run.parameters | default({}, true) | tojson }}
- If flow_run is exposed top-level:
{{ flow_run.parameters | default({}, true) | tojson }}
Tips:
- Don’t wrap the template in quotes; the field must render to a JSON object, not a JSON string.
- The little warning just means the UI can’t statically evaluate Jinja. Use “Test automation” to validate the rendered JSON.
- If the original run had no parameters, the default({}, true)
keeps it valid.
If you can paste the “Preview event data” section that contains parameters
, I’ll give you the exact one-liner for your account.José Agretti
08/12/2025, 2:04 PMMarvin
08/12/2025, 2:04 PMMarvin
08/12/2025, 2:07 PM{{ event.payload.flow_run.parameters | default({}, true) | tojson }}
- Alternate:
{{ event.flow_run.parameters | default({}, true) | tojson }}
If one renders, you’re done. The warning icon is just the editor grumbling because it can’t evaluate Jinja at edit time.
Option B: Confirm the exact path via CLI (deterministic)
- In a terminal, stream one event and make any flow run complete/fail/crash (the path to parameters is the same across states):
prefect events stream --run-once --format json
- You’ll get a JSON blob. Look for something like:
{
"event": "prefect.flow-run.Completed",
"payload": {
"flow_run": {
"parameters": { ... }
}
}
}
- Use that exact path in your Automation:
{{ event.payload.flow_run.parameters | default({}, true) | tojson }}
If your event JSON shows flow_run at the top level of the event object instead of under payload, use:
{{ event.flow_run.parameters | default({}, true) | tojson }}
Notes:
- Do not wrap the template in quotes; it must render to a JSON object, not a string.
- The default({}, true) keeps it valid if the original run had no parameters.
If you paste a single event JSON from that CLI output (trimmed around flow_run), I’ll give you the exact one‑liner for your workspace. Links for posterity: Automations, Events.José Agretti
08/12/2025, 2:11 PMauto-retry
.
- In the Trigger filters, add “Flow run tags does not contain `auto-retry`”.
@Marvin but I can't see the add a run tag partMarvin
08/12/2025, 2:11 PMMarvin
08/12/2025, 2:14 PMauto-retry
there.
Two solid workarounds:
A) Use the run name as the loop breaker
- In the action’s Run name field:
{{ event.payload.flow_run.name }} [auto-retry]
- In the trigger filters: “Flow run name does not contain auto-retry”
- Keep your Parameters templating as-is to reuse the original params.
B) Use cooldown/rate-limit only
- Action policy → set a Cooldown (e.g., 10m) and optionally a maximum run count per window. This won’t prevent infinite retries forever, but it stops rapid loops.
If you really want a parameter-based loop breaker, you can inject a marker param (and then filter on it if your Automation filters include parameter checks):
{% set p = event.payload.flow_run.parameters or {} %}
{% set _ = p.update({'_auto_retry': true}) %}
{{ p | tojson }}
Then add a trigger filter like “Flow run parameters does not contain _auto_retry = true” (only if your UI exposes parameter filters).
If upgrading is an option, newer UI builds expose more action fields; otherwise the name-based approach is the least painful.