<@ULVA73B9P> What is the diffeence in using flow a...
# ask-marvin
a
@Marvin What is the diffeence in using flow and subflow vs flow and task level
m
thought for 255.0 seconds
Great question — here’s the quick mental model in Prefect 3.x: - Flow: A deployable, schedulable pipeline with its own parameters, retries/timeouts, result persistence, logging, and state. Appears as a “flow run” in the UI. Can be paused/resumed/cancelled. Can target specific infrastructure via a deployment. - Task: A lightweight, orchestrated unit of work inside a flow. Supports retries/timeouts, caching, concurrency (via task runners), and logging. Not deployable/schedulable on its own. - Subflow: Calling one flow from another. From the parent it “looks like a step,” but it’s actually a full flow run with its own orchestration boundary (parameters, retries, result storage, state, logs, etc.). Shows up in the UI as a nested flow run. Can be deployed/scheduled independently and (optionally) run on different infrastructure. When to use which - Use a Task when you want: - Fine-grained steps within a single pipeline - Parallelism with a task runner (Thread/Process pools) - Task-level retries/caching - Lowest overhead per step - Use a Subflow when you want: - A reusable pipeline component with its own parameters and policy (retries/timeouts/persistence) - Independent observability and state in the UI (nested flow run) - Human-in-the-loop controls (pause/resume/cancel) at that step - Isolation or different infrastructure (via a separate deployment or triggering with
run_deployment
) - To schedule or trigger that component on its own outside the parent Key behavioral differences - Orchestration boundary - Task: Lives inside the parent flow run; shares its lifecycle. - Subflow: Separate flow run with its own lifecycle and state. - Retries/timeouts/caching - Both tasks and flows support retries/timeouts/caching independently via their decorators. - Typical pattern: cache and retry at task level for idempotent operations; use subflow-level retries for coarse-grained pipeline retries. - Concurrency and parallelism - Task: Parallelize with a task runner (e.g.,
ThreadPoolTaskRunner
,
ProcessPoolTaskRunner
) and
task.submit(...)
. - Subflow: Runs sequentially within the parent by default; for true infra isolation or queued concurrency, call a deployed child flow with
run_deployment
. - Parameters - Task: Normal Python call args. - Subflow: Flow parameters; visible in the UI and validated against the child flow’s signature. - Deployments and infrastructure - Task: Not deployable or schedulable. - Subflow: Can be deployed/scheduled independently. You can trigger a child deployment from a parent to run on different infra. - Observability and control - Task: Appears under the parent’s task runs; not pausable by itself. - Subflow: Nested flow run with its own logs, timeline, and controls (pause/resume/cancel). - Cancellation/pausing - Only flows can be paused/resumed; cancelling a parent cancels its tasks and any subflows. - Result storage/persistence - Both support
persist_result
,
result_storage
, and
result_serializer
. Subflow results are stored as the child flow’s result; the parent receives the returned value. Concrete examples 1) Flow with tasks (parallelism via a task runner)
Copy code
from prefect import flow, task
from prefect.task_runners import ThreadPoolTaskRunner

@task(retries=2, retry_delay_seconds=5)
def fetch(i: int) -> int:
    return i * 2

@task
def combine(values: list[int]) -> int:
    return sum(values)

@flow(name="task-only-etl", task_runner=ThreadPoolTaskRunner())
def etl(n: int = 10) -> int:
    futures = [fetch.submit(i) for i in range(n)]  # parallel fetches
    results = [f.result() for f in futures]
    return combine(results)

if __name__ == "__main__":
    etl()
2) Flow that calls a subflow (separate orchestration boundary) ``` from prefect import flow, task @task def clean(x: int) -> int: return x + 1 @flow(name="child-transform", retries=2, persist_result=True) def transform(xs: list[int]) -> list[int]:
return [clean(x) for x in xs] @flow(name="parent-etl") def etl_parent(): raw = [1, 2, 3] # This is a subflow call: a separate flow run with its own state/logs cleaned = transform(raw) return cleaned if name == "__main__": etl_parent()
Copy code
3) Parent triggers a child deployment (isolate infra / decouple execution)
from prefect import flow from prefect.deployments import run_deployment @flow def parent(): # Run an existing deployment for the child flow; it will execute on the # child’s configured work pool/infrastructure. fut = run_deployment( "child-transform/production", # "<flow-name>/<deployment-name>" parameters={"xs": [1, 2, 3]}, ) result = fut.wait() # wait for the child run to finish (optional) return result if name == "__main__": parent() ``` Practical guidance - Prefer tasks for most in-flow steps; they’re lighter and easy to parallelize. - Promote a step to a subflow when you need independent orchestration or infra, reusability across pipelines, or a place you might pause/resume/cancel separately. - Keep subflow nesting shallow for clarity and performance; prefer tasks unless you need the flow-level boundary. - Use tags and concurrency limits to protect external systems; use task runners for parallelism within a process; use deployments to scale across infra. If you can share your specific use case (e.g., long-running model training, external API fan-out, separate infra needs), I can recommend a concrete pattern. Further reading - Workflow design patterns with Prefect - Prefect 3 GA: what’s new and why it matters