<@ULVA73B9P> what should a flow failure mean seman...
# marvin-ai
r
@Marvin what should a flow failure mean semantically? like should all flows always ideally be successful, or if something fails to run or whatever in the workflow should i try to gracefully handle it, and then still raise an exception to mark the flow run as a failure? if i have a parent flow running multiple subflows, should i have child flows marked as failing but the parent orchestrator flow successful?
m
thought for 578.7 seconds
Great question. Short answer: let the flow’s final state reflect whether it achieved its business objective. In Prefect 3.x, “Completed” should mean “the thing this flow promises to do was done,” not “no exceptions ever occurred.” If something goes wrong but you handle it and the objective is still met, it’s perfectly fine for the flow to finish Completed. If the objective was not met, let the flow end Failed (by raising). Key semantics (Prefect 3.x) - Completed: Flow met its objective. - Failed: An error prevented the flow from meeting its objective (usually an unhandled exception inside the flow body). - Crashed: Unexpected termination (e.g., process died). - Cancelled: Explicitly cancelled. Docs: States, Retries When to fail vs. handle - If a step is non-critical and you can proceed, catch/log it and continue. The flow can still be Completed. - If a failure means the flow’s promise is broken, let the exception bubble (or raise your own) so the flow is Failed. - Use retries for transient errors at the task or flow level:
@flow(retries=3, retry_delay_seconds=60)
or on tasks. See Retries. Parent and child (subflows) - Subflows run in the same process and behave like callable units. If a child flow raises, that exception will propagate to the parent unless you catch it. - It’s common to: - Fail the parent if any critical child fails. - Allow the parent to Complete while recording/reporting failed children if the parent’s job is “orchestrate and report” (best-effort pattern). Patterns you can use 1) Orchestrator that completes if at least some children succeed
Copy code
from prefect import flow, get_run_logger

@flow
def child_flow(i: int) -> int:
    if i % 2:
        raise RuntimeError(f"child {i} failed")
    return i * 2

@flow(retries=1, retry_delay_seconds=30)
def parent_flow(n: int = 5, require_all_success: bool = False):
    log = get_run_logger()
    successes, failures = [], []

    for i in range(n):
        try:
            out = child_flow(i)  # child exceptions propagate unless caught
            successes.append((i, out))
        except Exception as exc:
            log.warning(f"Child {i} failed: {exc}")
            failures.append(i)

    if require_all_success and failures:
        raise RuntimeError(f"{len(failures)} child flows failed: {failures}")

    # Completed if we didn’t raise; still returns a clear summary
    return {"successes": successes, "failures": failures}
2) Best-effort tasks with downstream aggregation - If you want to pass failed task results into a downstream step without upstream-failure blocking, you can use `allow_failure`:
Copy code
from prefect import flow, task, allow_failure

@task
def maybe_parse(x: str) -> int:
    if "bad" in x:
        raise ValueError("parse failed")
    return len(x)

@task
def aggregate(items):
    # items may include allowed failures; decide what “success” means here
    ok, errs = [], []
    for item in items:
        if hasattr(item, "exception") and item.exception():
            errs.append(str(item.exception()))
        else:
            ok.append(item)
    return {"ok": ok, "errors": errs}

@flow
def pipeline(payloads: list[str]):
    futs = [maybe_parse.submit(p) for p in payloads]
    safe = [allow_failure(f) for f in futs]
    summary = aggregate(safe)
    # Completed if aggregate didn't raise; you can raise if errors > threshold
    return summary
Hooks and notifications - You can attach
on_failure
,
on_completion
, etc. on flows for alerts/side effects without changing success criteria. See the flow decorator options (e.g.,
on_failure
) in the API. Practical guidance - Define success criteria per flow. Completed means those criteria are satisfied. - Catch and log non-critical issues; fail only when the objective cannot be met. - For parent/child:
- Orchestrator/reporting flows can Complete even if some children Failed (surface the failures in logs/artifacts). - SLA-style parents should fail when any required child fails (enables retries/alerts/incident response). - Use retries for transient failures at the most granular level that makes sense (often tasks, sometimes the child flow). If you can share your specific objective for the parent flow (e.g., “all child loads must succeed” vs. “load as many as possible and report”), I can suggest an exact pattern and code skeleton to match it.