Tom Han
10/01/2025, 8:08 PMinference/inference enters "Crashed", trigger db/update_inference_status with the parameter inputted in inference/inference.Marvin
10/01/2025, 8:08 PMMarvin
10/01/2025, 8:10 PMinference/inference deployment (or the inference flow if you prefer).
- Action: “Run deployment” = db/update_inference_status
- Parameters: Use Jinja templates to pull values from the crashed run, e.g. {{ flow_run.parameters.<param_name> }}
Example parameters JSON for the action (adjust keys to match your flow’s schema):
{
"inference_id": "{{ flow_run.parameters.inference_id }}",
"crashed_run_id": "{{ flow_run.id }}",
"flow_name": "{{ flow_run.flow_name }}"
}
Notes:
- You can reference any parameter of the crashed run with {{ flow_run.parameters.<name> }} or {{ flow_run.parameters['<name>'] }}.
- Make sure db/update_inference_status is a deployment (automations can only start deployments).
- In the automation builder, limit the trigger to your specific flow/deployment to avoid firing on others.
Docs with details and more examples:
- Accessing parameters in automation templates: Access parameters in templates
- Automations overview: Automations how-to
If you share the exact parameter name(s) you need from inference/inference, I can give you the precise JSON to paste into the action.Tom Han
10/01/2025, 8:15 PMMarvin
10/01/2025, 8:15 PMMarvin
10/01/2025, 8:19 PMprefect flow-run cancel <flow-run-id>
- Or just wait: after heartbeats stop and the grace period passes, Prefect will reconcile the state (in some setups you may see it flip to Cancelled; in others, if there was no cancel request in time, it may become Crashed).
Helpful docs:
- Cancel workflows
- Detect zombie flows
If you share which work pool/worker type you’re using (Kubernetes, Docker, ECS, Process, etc.), I can be more specific about how/when Prefect enforces the Cancelling -> Cancelled transition for that infrastructure.Tom Han
10/01/2025, 8:32 PMMarvin
10/01/2025, 8:32 PMMarvin
10/01/2025, 8:34 PMprefect flow-run cancel <flow-run-id> sets the run to Cancelling. If the worker is already dead, it may not progress to Cancelled on its own. In that case, use the UI “Set state” above or the API/Python snippet below.
- Python API (force set state):
import asyncio
from prefect.client.orchestration import get_client
from prefect.states import Cancelled
async def force_cancel(flow_run_id: str):
async with get_client() as client:
await client.set_flow_run_state(flow_run_id, state=Cancelled())
# Usage:
# asyncio.run(force_cancel("<your-flow-run-id>"))
A few notes:
- If the worker/infrastructure is already gone, Prefect won’t get an acknowledgment, so “Cancelling” can linger until backend reconciliation. Manually setting the state is the right move in that scenario.
- After forcing Cancelled, double-check any external resources (pods/containers/etc.) since your worker was killed and may not have cleaned them up.
Docs:
- Cancel workflows
- Detect zombie flows
If you’d like, tell me the run ID and I’ll share the exact command/script to run, or help verify if your setup is on Cloud vs. self-hosted and what heartbeat timing to expect.Tom Han
10/01/2025, 9:05 PMitem doesn't exist
99 getattr(self._val_or_ser, item)
--> 100 raise PydanticUserError(self._error_message, code=self._code)
PydanticUserError: StateCreate is not fully defined; you should define all referenced types, then call StateCreate.model_rebuild().
For further inforMarvin
10/01/2025, 9:05 PMMarvin
10/01/2025, 9:08 PMprefect flow-run cancel e975b675-78e6-4e50-b0fd-f180b836e324
If the worker is gone, it may not progress; use the UI “Set state” to force Cancelled.
Fix your Python approach
- Add force=True and run in a fresh Python process (not a long-lived notebook kernel):
import asyncio
from prefect.client.orchestration import get_client
from prefect.states import Cancelled
async def force_cancel(flow_run_id: str):
async with get_client() as client:
await client.set_flow_run_state(flow_run_id, state=Cancelled(), force=True)
asyncio.run(force_cancel("e975b675-78e6-4e50-b0fd-f180b836e324"))
- If you must stay in the same notebook/kernel, run the following once before calling set_flow_run_state to work around the pydantic “not fully defined” error:
from prefect.client.schemas.actions import StateCreate
StateCreate.model_rebuild()
Then call the force_cancel function above.
REST fallback (bypasses the client models entirely)
- Works with Prefect Cloud or self-hosted; set API_URL/API_KEY appropriately:
import os, httpx
API_URL = os.getenv("PREFECT_API_URL") # e.g. <https://api.prefect.cloud/api/accounts/.../workspaces/>...
API_KEY = os.getenv("PREFECT_API_KEY") # Cloud only; omit for open server
FLOW_RUN_ID = "e975b675-78e6-4e50-b0fd-f180b836e324"
payload = {
"state": {"type": "CANCELLED", "name": "Cancelled", "state_details": {}},
"force": True
}
headers = {"Authorization": f"Bearer {API_KEY}"} if API_KEY else {}
with httpx.Client(timeout=30) as c:
r = c.post(f"{API_URL}/flow_runs/{FLOW_RUN_ID}/set_state", json=payload, headers=headers)
r.raise_for_status()
print(r.json())
If you’re seeing a lot of zombie runs linger in Cancelling, that’s expected when the worker is dead; forcing Cancelled as above is the right move. If possible, share:
- Output of prefect version
- Whether you’re on Cloud or self-hosted
- Whether you’re executing from a notebook with autoreload
Docs:
- Cancel workflows
- Detect zombie flows