alex toney
06/30/2025, 2:52 PM"reason": "Failed to set state: Run is already in terminal state COMPLETED.",
We're using the existing set of states for the cleanup zombie runs, do you know why this is happening?Marvin
06/30/2025, 2:52 PMMarvin
06/30/2025, 2:53 PMpython
if flow_run.state.is_terminal():
<http://logger.info|logger.info>(f"Run {flow_run.id} is already in terminal state {flow_run.state.type}, skipping")
continue
2. Consider using optimistic locking when updating states to handle race conditions:
python
try:
await flow_run.set_state(crashed_state)
except Exception as e:
if "already in terminal state" in str(e):
<http://logger.info|logger.info>(f"Run {flow_run.id} reached terminal state before cleanup")
continue
raise
This error is actually a good thing - it's Prefect's way of maintaining state consistency. The error indicates that your cleanup automation might be trying to process the same runs multiple times or attempting to modify runs that have already been properly completed.
You might also want to adjust your query for zombie runs to explicitly exclude runs that are already in terminal states when initially fetching them for cleanup.alex toney
06/30/2025, 3:03 PMMarvin
06/30/2025, 3:03 PMMarvin
06/30/2025, 3:04 PMalex toney
06/30/2025, 3:08 PMMarvin
06/30/2025, 3:08 PMMarvin
06/30/2025, 3:09 PMalex toney
06/30/2025, 3:13 PM{
"type": "event",
"match": {
"prefect.resource.id": "prefect.flow-run.*"
},
"match_related": {},
"after": [
"prefect.flow-run.Resuming",
"prefect.flow-run.AwaitingConcurrencySlot",
"prefect.flow-run.AwaitingRetry",
"prefect.flow-run.Running",
"prefect.flow-run.Pending",
"prefect.flow-run.Cancelling",
"prefect.flow-run.Retrying"
],
"expect": [
"prefect.flow-run.*"
],
"for_each": [
"prefect.resource.id"
],
"posture": "Proactive",
"threshold": 1,
"within": 10800
}
Marvin
06/30/2025, 3:13 PMMarvin
06/30/2025, 3:13 PMprefect.flow-run.*
) within 10800 seconds (3 hours)
3. If no event occurs within that window, the automation triggers
The issue is that this configuration will trigger even for flows that complete successfully but do so after the automation checks. Here's the likely sequence:
1. Flow enters one of those states (e.g., "Running")
2. Flow completes successfully (reaches COMPLETED state)
3. But the automation is still watching that window and tries to update the state
4. Since the flow is already COMPLETED, you get the error
To fix this, you should:
1. Either reduce the within
window (3 hours might be too long for your typical flows)
2. Or modify the expect
list to specifically look for terminal states:
json
"expect": [
"prefect.flow-run.Completed",
"prefect.flow-run.Failed",
"prefect.flow-run.Crashed",
"prefect.flow-run.Cancelled"
]
This way, the automation will only trigger if a flow doesn't reach any terminal state within the window, which is a more accurate definition of a zombie run.
Would you like me to explain how to modify these settings in the UI, or would you prefer to make these changes directly in the configuration?alex toney
06/30/2025, 3:15 PMMarvin
06/30/2025, 3:15 PMMarvin
06/30/2025, 3:15 PM