Ravish Mallya
09/24/2025, 2:32 PMMarvin
09/24/2025, 2:32 PMMarvin
09/24/2025, 2:36 PMPREFECT_RUNNER_HEARTBEAT_FREQUENCY
to an integer ≥ 30 (seconds). Default is disabled (None
).
- This emits flow-run heartbeat events that you can use for crash (zombie) detection via Automations.
Ways to set it for ECS
1) Per-deployment (job variables)
- In your prefect.yaml:
deployments:
- name: my-ecs-deployment
entrypoint: path/to/flow.py:my_flow
work_pool:
name: my-ecs-pool
job_variables:
env:
PREFECT_RUNNER_HEARTBEAT_FREQUENCY: 60
- Or in Python when deploying:
from prefect import flow
@flow
def my_flow():
...
(
my_flow
.from_source("<https://github.com/org/repo>", entrypoint="flows/my_flow.py:my_flow")
.deploy(
name="ecs-deploy",
work_pool_name="my-ecs-pool",
job_variables={
"env": {
"PREFECT_RUNNER_HEARTBEAT_FREQUENCY": 60
}
},
)
)
2) For all ECS runs in a work pool
- Edit the ECS work pool’s base job template (in the UI) and add:
"env": {
"PREFECT_RUNNER_HEARTBEAT_FREQUENCY": 60
}
This applies to every deployment that uses that pool.
Notes and related settings
- Flow-run heartbeat is different from the worker heartbeat. Workers have their own heartbeat interval (PREFECT_WORKER_HEARTBEAT_SECONDS
, default 30). You don’t need to change that to enable flow-run heartbeats.
- To automatically mark stuck/“zombie” flows as Crashed when heartbeats are missed, create an Automation that reacts to missed heartbeats (see docs below).
Docs for deeper reference
- Detect zombie flows and heartbeats: Detect zombie flows
- Settings reference (includes runner and worker heartbeat settings): Settings reference
- Customize job variables for deployments: Customize job variables
- ECS worker/guide: ECS guide
- Worker health checks (separate from flow heartbeats): Worker healthchecks
Would you like this applied per-deployment or at the work pool level for all ECS runs? And do you also want help wiring an Automation to mark runs as Crashed if heartbeats are missed?Ravish Mallya
09/24/2025, 4:11 PMMarvin
09/24/2025, 4:12 PMMarvin
09/24/2025, 4:13 PMdeployments:
- name: my-ecs-deployment
entrypoint: path/to/flow.py:my_flow
work_pool:
name: my-ecs-pool
job_variables:
env:
PREFECT_RUNNER_HEARTBEAT_FREQUENCY: "60"
Equivalent in Python:
(
my_flow
.from_source("<https://github.com/org/repo>", entrypoint="flows/my_flow.py:my_flow")
.deploy(
name="ecs-deploy",
work_pool_name="my-ecs-pool",
job_variables={
"env": {"PREFECT_RUNNER_HEARTBEAT_FREQUENCY": "60"}
},
)
)
If you prefer to set it for all runs in the ECS work pool, edit the pool’s base job template in the UI and add:
"env": {
"PREFECT_RUNNER_HEARTBEAT_FREQUENCY": "60"
}
If you’re still seeing the schema error, can you share the exact YAML snippet you’re using? I’ll spot-check the structure.
Refs:
- Customize job variables
- Detect zombie flows (heartbeats)
- Settings referenceRavish Mallya
09/24/2025, 4:32 PMMarvin
09/24/2025, 4:32 PMMarvin
09/24/2025, 4:37 PMPREFECT_RUNNER_HEARTBEAT_FREQUENCY
is set (≥ 30).
- Easiest check: log the resolved setting from within your flow at startup.
from prefect import flow
from prefect.settings import get_current_settings
@flow
def my_flow():
s = get_current_settings()
print(f"Runner heartbeat_frequency = {s.runner.heartbeat_frequency!r}") # e.g., 60
...
- Alternatively, if you can exec or add a startup command in your container, run:
prefect config view --show-defaults --show-sources | grep PREFECT_RUNNER_HEARTBEAT_FREQUENCY
You should see it set from env and its value.
2) Look for heartbeat events in the UI
- Go to your flow run details → Events tab.
- You should see periodic heartbeat events arriving roughly every N seconds (your configured frequency).
- You can also check the global Events page and filter by the flow run ID; search for “heartbeat” in the event feed.
3) Quick end-to-end test (optional)
- Start a long-running flow with heartbeats enabled.
- Kill the ECS task/container mid-run. If heartbeats were flowing, Prefect will stop receiving them and your Automations (see below) can detect this and mark the run as Crashed or notify you.
4) Add an Automation to confirm behavior on missed heartbeats
- Create an Automation that reacts when flow-run heartbeats are missed longer than your interval (e.g., > 2x your frequency) and send a notification or mark the run as Crashed. This both verifies heartbeats and gives you protection against “zombie” runs.
- Docs: Detect zombie flows
Troubleshooting if you don’t see heartbeats
- Ensure the env var is a string in YAML (e.g., "60"
, not 60
) and placed under work_pool -> job_variables -> env
.
- Confirm the ECS work pool’s base job template or the deployment’s job_variables
actually include the env var that’s reaching the flow container.
- Make sure you’re looking at the right flow run’s Events tab and the time window is correct.
References
- Heartbeats (zombie detection): Detect zombie flows
- Settings reference: Settings reference
- Customize job variables (env in deployments): Customize job variables
If you want, share a link/screenshot of your flow run’s Events tab or your prefect.yaml
snippet and I’ll double-check that everything’s wired correctly.