<@ULVA73B9P> i have a flow that has multiple tasks...
# ask-marvin
d
@Marvin i have a flow that has multiple tasks dynamically added based on a list. currently i deploy and run them locally with a module type entrypoint in the deployment. is it possible to create a task within this flow that spins up and runs in a separate docker container?
m
thought for 193.7 seconds
Short answer: not as a “task.” In Prefect 3.x, infrastructure (like Docker) is attached at the flow-run level, not per-task. If you want a piece of work to run in its own container, make it a subflow and give that subflow its own infrastructure. Recommended pattern - Extract the containerized work into a subflow. - Deploy that subflow to a Docker work pool (image, env, volumes, etc.). - From your parent flow (running locally), call the subflow deployment for each item in your list. Each call spins up its own containerized flow run. Example
Copy code
from prefect import flow
from prefect.deployments import run_deployment

@flow
def parent(items: list[str]):
    # fan out: one containerized run per item
    for item in items:
        # "heavy-job/heavy-docker" => flow-name/deployment-name
        run_deployment(
            name="heavy-job/heavy-docker",
            parameters={"item": item}
        )

# Your separate subflow (deployed to a Docker work pool)
# @flow
# def heavy_job(item: str):
#     ...
How to set it up - Create a Docker work pool (UI is easiest). - Deploy the subflow to that pool with your desired image and job variables. In Prefect 3.x you’ll use from-source deployments, e.g.
flow.from_source(...).deploy(...)
. See deploy docs: Deploying in Prefect 3 - Trigger via
run_deployment
as shown above. API ref: run_deployment Alternatives (if you must) - Fire-and-forget a container from a task: use the
prefect-docker
collection to create/start/monitor a container from within your local flow run. This won’t be a Prefect-managed task run; you’ll be managing the container lifecycle and error handling yourself. - Functions you’d use:
create_docker_container
,
start_docker_container
,
get_docker_container_logs
,
remove_docker_container
. - Docs: prefect-docker collection - Experimental shortcut: decorate a subflow to always run via a Docker work pool without creating a deployment:
Copy code
from prefect import flow
  from prefect_docker.experimental.decorators import docker

  @docker(work_pool="docker-pool", image="myorg/myimg:tag")
  @flow
  def heavy_job(item: str):
      ...
Then call
heavy_job(item)
from your parent flow and it’ll spin up a Docker-backed subflow run. It’s experimental; deployments are more battle-tested. - Source: prefect-docker repo If you tell me whether you want one container per list item or just a single offloaded step occasionally, I can sketch the exact wiring (image, env, volumes, result passing) to keep it tidy.
d
@Marvin my current flow is being deployed using my_deployment = my_flow.ato_deployment(...) await my_deployment.apply() does the first pattern still work? also consider that all of these are also running within a docker container by itself already. do both recommendations still work?
m
thought for 44.9 seconds
A few things to untangle: 1) About your deployment API - If you meant `my_flow.to_deployment(...); await my_deployment.apply()`: yes, that’ll register a deployment the parent flow can use. In Prefect 3.x the recommended API is
flow.from_source(...).deploy(...)
, but if you already have a registered deployment, the pattern below still works just fine. - If you really have
ato_deployment
, I suspect a typo. If not, mind sharing your Prefect version? That’ll help me avoid guessing. 2) Does the “run a subflow in its own Docker container” pattern still work? - Yes. It doesn’t matter how the parent is deployed (module entrypoint, local, etc.). From the parent flow, call
run_deployment(...)
for the subflow that’s deployed to a Docker work pool. Each call creates a new flow run that a Docker worker will execute in its own container. - Example fan-out:
Copy code
from prefect import flow
from prefect.deployments import run_deployment

@flow
def parent(items: list[str]):
    for item in items:
        run_deployment("heavy-job/heavy-docker", parameters={"item": item})
- API ref: run_deployment - Deploy docs (3.x): Deploying in Prefect 3 3) You’re already running the parent inside Docker — does that change anything? - For the subflow-via-Docker-work-pool pattern: still fine. The parent just asks the API to start another flow run. A Docker worker (wherever it’s running) will launch the container. If your worker is also in Docker, make sure it can access Docker (bind mount
/var/run/docker.sock
or run DinD). - For the “start a container from a task” pattern using `prefect-docker`: also fine, but your parent container must be able to talk to the Docker daemon (same socket/DIN-D caveats). You’ll be managing container lifecycle yourself: - Functions:
create_docker_container
,
start_docker_container
,
get_docker_container_logs
,
remove_docker_container
- Collection docs: prefect-docker Optional/experimental - You can mark a subflow to always run on a Docker work pool without creating a deployment using:
Copy code
from prefect import flow
from prefect_docker.experimental.decorators import docker

@docker(work_pool="docker-pool", image="myorg/myimg:tag")
@flow
def heavy_job(item: str):
    ...
- Then call
heavy_job(item)
from the parent. This is experimental; deployments are more battle-tested. - Repo: prefect-docker If you confirm your Prefect version and the exact method you’re using (
to_deployment
vs
deploy
), I can show the precise snippet to register the subflow deployment and wire up the Docker work pool cleanly.
d
@Marvin prefect version = 3.4.10, yes using my_flow.ato_deployment
m
thought for 48.0 seconds
Thanks — 3.4.10 noted. And yes,
my_flow.ato_deployment(...); await my_deployment.apply()
is a valid 3.x pattern (the async variant), so you don’t need to change that for this to work. Answers to your two checks: - Does the “subflow via run_deployment to a Docker work pool” pattern still work? Yes. It’s independent of how the parent flow is deployed. As long as the subflow has a registered deployment targeting a Docker work pool, calling
run_deployment("flow-name/deployment-name", parameters=...)
from the parent will spin up a separate Docker container for each call. - API ref: run_deployment - Deploy docs: Deploying in Prefect 3 - You’re already running the parent in Docker — do both approaches still work? - Subflow via Docker work pool: Yes. Only the Docker worker needs access to a Docker daemon. The parent container doesn’t need Docker access for this path; it just asks the API to start another flow run that the worker executes. - Manually launching a container from a task (using
prefect-docker
): Also yes, but then your parent container must be able to talk to Docker (e.g., mount
/var/run/docker.sock
or run with DinD). You’ll manage container lifecycle and error propagation yourself. - Collection docs: prefect-docker Two practical caveats to keep life less exciting: - Make sure your subflow deployment either references a base image that already contains your code or uses a source (e.g., Git) the Docker worker can pull. Otherwise the worker’s container won’t have your flow code. - If you go the manual
prefect-docker
route, remember to clean up containers on failure paths and consider how you’ll surface logs/results back to the flow. If you want, share the name of your Docker work pool and how you package the subflow code (baked into the image vs. from_source), and I’ll sketch the exact deployment +
run_deployment
wiring.
d
@Marvin how do you provide the "Docker worker needs access to a Docker daemon"? by just mounting volumes: - ./src:/home/worker/src - /var/run/docker.sock:/var/run/docker.sock in docker-compose for the worker, it still doesnt allow subprocess to call docker. Task run failed with exception: RuntimeError('Docker container failed: /docker_run_pipeline.sh: line 84: docker: command not found\n')
m
thought for 79.8 seconds
That error tells you exactly what’s wrong: your container doesn’t have the
docker
CLI installed. Mounting
/var/run/docker.sock
just gives you a socket to talk to the daemon; you still need a client inside the container, and it needs permission to use the socket. You’ve got three viable paths: 1) Keep shelling out to
docker
(install the CLI + handle permissions) - Extend your worker image to include the Docker CLI. - Debian/Ubuntu-based Prefect image:
Copy code
FROM prefecthq/prefect:3-python3.11
  RUN apt-get update && apt-get install -y --no-install-recommends <http://docker.io|docker.io> && rm -rf /var/lib/apt/lists/*
- Alpine-based (if you’re using Alpine):
Copy code
FROM prefecthq/prefect:3-python3.11
  RUN apk add --no-cache docker-cli
- docker-compose snippet (run as root to dodge socket group headaches, or add the user to the socket’s group):
Copy code
services:
    worker:
      build: .
      user: root
      environment:
        - PREFECT_API_URL=...
        - PREFECT_API_KEY=...
        - DOCKER_HOST=unix:///var/run/docker.sock
      volumes:
        - /var/run/docker.sock:/var/run/docker.sock
        - ./src:/home/worker/src
      command: >
        prefect worker start --pool "docker-pool"
- If you don’t want to run as root, ensure the container user is in the group that owns the socket:
Copy code
# inside the container at startup
  DOCKER_GID=$(stat -c '%g' /var/run/docker.sock)
  groupadd -g "$DOCKER_GID" docker || true
  usermod -aG docker <your-user>
Then drop
user: root
. 2) Don’t use the CLI at all; use the Docker SDK (what
prefect-docker
does) - No
docker
binary required; just mount the socket and install the Python package. - Example within a task:
Copy code
from prefect import task
  from prefect_docker.containers import create_docker_container, start_docker_container, get_docker_container_logs, remove_docker_container

  @task
  def run_container(image: str, command: list[str]):
      cid = create_docker_container(image=image, command=command)
      try:
          start_docker_container(cid)
          logs = get_docker_container_logs(cid)
          return logs
      finally:
          remove_docker_container(cid)
- Ensure the worker image has
prefect[docker]
(and thus the Docker SDK) installed and the socket mounted:
Copy code
pip install "prefect[docker]"
- Docs: prefect-docker 3) Use a sidecar DinD daemon instead of the host socket - Run a
docker:dind
service in your compose file and point
DOCKER_HOST
at it:
Copy code
services:
    dind:
      image: docker:dind
      privileged: true
    worker:
      image: your-worker
      environment:
        - DOCKER_HOST=<tcp://dind:2375>
      depends_on:
        - dind
- Still need the CLI (path 1) or the SDK (path 2) in the worker. Given your error, the minimal fix is: install the Docker CLI in your worker image. After that, if you hit “permission denied” against the socket, run the worker as root or add the user to the socket’s group. If you can change your code, switching to the Docker SDK (
prefect-docker
) is cleaner and avoids the CLI entirely.
d
@Marvin provide more details on the prefect-docker route, what is needed to be installed and setup for the worker docker container, what should the pattern look like from a task within a flow to create container etc.
m
thought for 271.7 seconds
Got it — here’s the “Prefect + Docker” route end-to-end, with what to install, how to run a Docker worker (including from a Docker container), how to build/push images with deployments, and a task pattern to create/operate containers from within your flow. Assumptions: Prefect 3.x. Links - Docker worker how-to: Docker worker in Prefect 3 - CLI help we verified: -
prefect work-pool create --help
-
prefect worker start --help
-
prefect deploy --help
- Docker collection API (what you call from tasks): functions live in the
prefect-docker
collection: - Container helpers:
prefect_docker.containers
(create/start/stop/remove/get logs) - Deployment build steps:
prefect_docker.deployments.steps
(build/push image) - Worker config type:
prefect_docker.worker.DockerWorkerJobConfiguration
- Prefect image: prefecthq/prefect What to install - On the machine running the worker (host or container): - Docker Engine available to the worker - Python and Prefect 3 - The Docker collection package - Commands:
Copy code
pip install "prefect>=3" prefect-docker
If you prefer “one-liner with extras” for CLI use:
Copy code
pip install "prefect[docker]"
Setting up a Docker work pool and worker 1) Create a Docker work pool
Copy code
prefect work-pool create my-docker-pool --type docker
2) Build and push your flow image during deploy (recommended) Use
prefect.yaml
with the docker build/push steps from the
prefect-docker
collection, then point your deployment at the resulting image. Example `prefect.yaml`:
Copy code
name: my-project

build:
  - prefect_docker.deployments.steps.build_docker_image:
      id: build_image
      image_name: <http://ghcr.io/you/prefect-app|ghcr.io/you/prefect-app>
      tag: "{{ flow_version | default('latest') }}"
      dockerfile: Dockerfile
  - prefect_docker.deployments.steps.push_docker_image:
      requires: build_image
      image_name: "{{ build_image.image_name }}"
      tag: "{{ build_image.tag }}"
      credentials: "{{ prefect.blocks.docker-registry-credentials.my_creds }}"

deployments:
  - name: my-deployment
    entrypoint: flows/my_flow.py:my_flow
    work_pool:
      name: my-docker-pool
    job_variables:
      image: "{{ build_image.image_name }}:{{ build_image.tag }}"
      env:
        EXAMPLE: hello
      volumes:
        - "/host/data:/app/data:ro"
      network_mode: bridge
      auto_remove: true
3) Run a worker - On a host with Docker:
Copy code
prefect worker start --pool my-docker-pool
- Inside a Docker container (so it can launch sibling containers), mount the Docker socket:
Copy code
docker run --rm -it \
  -e PREFECT_API_URL="<https://api.prefect.cloud/api/accounts/<acct>/workspaces/<ws>>" \
  -e PREFECT_API_KEY="<your-api-key>" \
  -v /var/run/docker.sock:/var/run/docker.sock \
  -v ~/.prefect:/root/.prefect \
  prefecthq/prefect:3-latest \
  prefect worker start --pool my-docker-pool
Notes: - Mounting
/var/run/docker.sock
gives the worker access to the host Docker daemon. Your security team will have opinions. - For Prefect Server, use your server URL in
PREFECT_API_URL
. Optional: a basic Dockerfile for your flow image
Copy code
FROM prefecthq/prefect:3-latest
WORKDIR /opt/prefect
COPY . /opt/prefect
RUN pip install -U pip && \
    pip install -r requirements.txt
ENV PYTHONPATH=/opt/prefect
Pattern: create and manage a Docker container from a task You generally don’t need to do this for “normal” Prefect runs — the Docker worker already launches a container for each flow run. But if your task needs to spin up a sibling container (e.g., run a tool that only exists as a container), use the
prefect-docker
collection. This pattern: - Pulls a private image if needed (optional) - Creates and starts the container - Streams logs, waits, checks exit status - Cleans up in finally ``` from prefect import flow, task from prefect_docker.containers import (
create_docker_container, start_docker_container, get_docker_container_logs, stop_docker_container, remove_docker_container, ) from prefect_docker.host import DockerHost from prefect_docker.credentials import DockerRegistryCredentials @task def run_sidecar_job( image: str, command: list[str] | None = None, env: dict[str, str] | None = None, volumes: list[str] | None = None, docker_host_url: str | None = None, registry_block_name: str | None = None, ) -> dict: docker_host = DockerHost.from_url(docker_host_url) if docker_host_url else None # Optional: Authenticate to private registry via a block if registry_block_name: creds = DockerRegistryCredentials.load(registry_block_name) creds.login() container = None try: container = create_docker_container( image=image, command=command, name=None, # let Docker name it detach=True, # run in background environment=env or {}, docker_host=docker_host, volumes=volumes or [], # e.g. ["/host/path/container/pathro"] auto_remove=False, # we’ll handle cleanup ) start_docker_container(container.id, docker_host=docker_host) # Stream logs until exit logs = get_docker_container_logs( container.id, docker_host=docker_host, stream=False, # use True to yield logs incrementally follow=True, ) # Inspect exit code # If you need exit status, query the underlying SDK via container.attrs or re-fetch # But a simple pattern is to try a stop (no-op if exited) then inspect: try: stop_docker_container(container.id, docker_host=docker_host, timeout=1) except Exception: pass # Re-fetch the container to get state # (If using docker SDK directly, it’s container.reload(); here we keep it simple) # As a shortcut, if your command is deterministic, you can just treat non-empty logs as success. return { "container_id": container.id, "logs": logs, } finally: if container: try: remove_docker_container(container.id, docker_host=docker_host, force=True) except Exception: pass @flow def orchestrate(): result = run_sidecar_job( image="alpine:3.20", command=["sh", "-c", "echo hello from container && sleep 1 && echo done"], env={"EXAMPLE": "value"}, volumes=[], docker_host_url=None, # or "unix:///var/run/docker.sock" registry_block_name=None, # e.g., "my_dockerhub_creds" ) return result if name == "__main__": orchestrate() ``` Where this runs: - If your flow runs in a container (via the Docker worker), the task above will create sibling containers on the same Docker host — as long as the worker has access to the Docker daemon (e.g., /var/run/docker.sock mounted). - If your flow runs locally, it uses your local Docker daemon. Using job variables on deployments (what the worker respects) The Docker worker reads its “job config” from your deployment. Important fields (you can set in
job_variables
): -
image
(required): the flow-run image - `env`: dictionary of environment variables for the flow run - `volumes`: e.g.,
["/host/path:/container/path:rw"]
- `network_mode`: e.g.,
bridge
or a custom network -
mem_limit
,
memswap_limit
,
privileged
,
auto_remove
,
image_pull_policy
(IfNotPresent|Always|Never) Signature reference:
prefect_docker.worker.DockerWorkerJobConfiguration
. Common gotchas and best practices - If running the worker in Docker, mount the Docker socket (
-v /var/run/docker.sock:/var/run/docker.sock
) so it can start flow-run containers.
- Use a small, deterministic tag strategy in
prefect.yaml
(e.g., commit SHA or flow version). - For private registries, use a
DockerRegistryCredentials
block and reference it in the push step, or login on the worker. - Keep
auto_remove: true
for flow-run containers to avoid littering the host; for debugging, set it to false. - Don’t create containers from tasks unless you truly need to — it’s extra moving parts. The Docker worker already launches a container for each flow run. If you share how you’re deploying (local vs containerized worker), private registry details, and whether your flow needs to run nested containers for work, I can tailor the exact
prefect.yaml
, Dockerfile, and worker command for your case. I live to serve. Begrudgingly.
d
@Marvin i am running the prefect worker in its own docker container. how about paths? will there be path conflicts between: host running prefect worker container, prefect worker container, and sibling container? especially considering the passing of Dockerfile / container entrypoint script all the way till the sibling container?
m
thought for 141.8 seconds
Short answer: yes, path confusion is common when the worker runs in a container. The worker’s filesystem is not the flow-run container’s filesystem, and neither are the host’s. You must think in terms of the Docker host, because that’s where the Docker daemon runs and where volume paths are resolved. Key rules - The worker container and flow-run containers are siblings; they do not share files unless you mount the same host paths or named volumes into both. - Volume specs (e.g.,
/host/path:/container/path:rw
) in deployment job variables and in
prefect_docker.containers.create_docker_container(...)
are interpreted on the Docker host, not inside the worker or flow-run container. - The worker does not “pass” your Dockerfile or entrypoint script at runtime; those belong in the image you build during
prefect deploy
. The worker just pulls an image and runs it. Links - Docker worker how-to: Prefect 3 Docker worker - Prefect Docker collection (APIs you might call from tasks): prefect-docker (module contains
containers
,
deployments.steps
,
worker
) - CLI refs we verified:
prefect work-pool create --help
,
prefect worker start --help
,
prefect deploy --help
What to mount where - Worker container (so it can launch siblings): - Mount Docker socket:
-v /var/run/docker.sock:/var/run/docker.sock
- Mount Prefect config dir if you want persistence:
-v ~/.prefect:/root/.prefect
- Nothing else is required for the worker to launch flow-run containers. - Flow-run containers (what the worker starts per flow run): - If your tasks will start “sibling” containers, the flow-run container also needs Docker access: - Mount the Docker socket in deployment job variables:
Copy code
job_variables:
        volumes:
          - "/var/run/docker.sock:/var/run/docker.sock"
- Any files you want available to both the flow-run container and sibling containers must come from the host (or a named volume) and be mounted into both. Two safe patterns to avoid path hell 1) Named volume pattern (best for sharing scratch/work dirs) - Create/use a named volume (Docker will create it if missing). - Mount it into both the flow-run container (via deployment) and sibling containers (via your task): - Deployment:
Copy code
job_variables:
      volumes:
        - "shared-data:/mnt/shared:rw"
        - "/var/run/docker.sock:/var/run/docker.sock"
- Task creating a sibling container:
Copy code
create_docker_container(
      image="alpine:3.20",
      command=["sh", "-c", "ls -la /mnt/shared && echo done"],
      volumes=["shared-data:/mnt/shared:rw"],
      detach=True,
    )
- No host paths to keep in sync; the engine manages the named volume. 2) Absolute host path pattern (if you need a real host directory) - Pick a concrete host path, mount it identically into both containers: - Deployment:
Copy code
job_variables:
      volumes:
        - "/srv/data:/srv/data:rw"
        - "/var/run/docker.sock:/var/run/docker.sock"
- Task creating a sibling container:
Copy code
create_docker_container(
      image="alpine:3.20",
      command=["sh", "-c", "ls -la /srv/data && echo done"],
      volumes=["/srv/data:/srv/data:rw"],
      detach=True,
    )
- Inside your flow-run code, when you call
create_docker_container(..., volumes=...)
, remember the left-hand side is a host path, not the flow-run container’s path. What not to do - Don’t use paths that only exist inside the worker container in your deployment
volumes
. The Docker daemon will happily create a new host directory (e.g.,
/data
) and mount that instead, which is… not what you meant. - Don’t rely on relative paths. Absolute only. - Don’t assume the worker’s current working directory matters to flow-run/sibling containers. It doesn’t. About Dockerfile and entrypoint scripts
- These are resolved at image build time, during
prefect deploy
(e.g.,
prefect_docker.deployments.steps.build_docker_image
and
push_docker_image
). The worker never needs your Dockerfile or your entrypoint script on disk. - If your sibling container needs its own script, bake it into that sibling image too. There’s no “pass-through” of files from the worker or flow-run container unless you mount them from the host or a named volume. If you must build images at runtime inside the flow-run container (not recommended) - You’ll need: - Docker socket mounted in the flow-run container - Build context accessible to the Docker daemon on the host (so your code must live on a host path and be mounted; building from
/app
inside the container won’t work unless
/app
is also a host path mount) - Docker CLI installed in the flow-run image and appropriate network/permissions - It’s almost always simpler and safer to build during
prefect deploy
and push to a registry. Minimal examples - Start worker container
Copy code
docker run --rm -it \
  -e PREFECT_API_URL="<your-api-url>" \
  -e PREFECT_API_KEY="<your-api-key>" \
  -v /var/run/docker.sock:/var/run/docker.sock \
  -v ~/.prefect:/root/.prefect \
  prefecthq/prefect:3-latest \
  prefect worker start --pool my-docker-pool
- Deployment
prefect.yaml
with shared volume and nested Docker access
Copy code
deployments:
  - name: my-deployment
    entrypoint: flows/my_flow.py:my_flow
    work_pool:
      name: my-docker-pool
    job_variables:
      image: "<http://ghcr.io/you/prefect-app:latest|ghcr.io/you/prefect-app:latest>"
      volumes:
        - "/var/run/docker.sock:/var/run/docker.sock"
        - "shared-data:/mnt/shared:rw"
- Task that spawns a sibling container
Copy code
from prefect import task
from prefect_docker.containers import create_docker_container, start_docker_container, get_docker_container_logs, remove_docker_container

@task
def run_sidecar():
    c = create_docker_container(
        image="alpine:3.20",
        command=["sh", "-c", "echo 'sidecar sees:' && ls -la /mnt/shared"],
        volumes=["shared-data:/mnt/shared:rw"],
        detach=True,
    )
    start_docker_container(c.id)
    logs = get_docker_container_logs(c.id, follow=True)
    remove_docker_container(c.id, force=True)
    return logs
Sanity checklist - Worker has Docker socket mounted? Then it can launch flow-run containers. - Flow-run container needs to launch siblings? It must also have the Docker socket mounted. - Need to share files? Use a named volume or the same absolute host path in both containers and in the sibling’s creation request. - Dockerfile/entrypoint aren’t “passed through” the worker; they’re baked into images during
prefect deploy
. If you can share: - Do you need to share specific files between flow-run and sibling containers? Named volume or host path? - Are you trying to build images at runtime from inside the flow? I’ll tailor the exact
prefect.yaml
and volume mounts to your setup. With only a slight sigh.
d
@Marvin this path consideration problem for running a docker container inside the worker container and the possible mismatches, is it present for both options of 1. using docker cli / DinD or 2. using prefect-docker as per previous recommendations?
m
thought for 61.9 seconds
Short answer: the “which path is this?” problem is about the Docker daemon you’re talking to, not the client you use. So yes, it exists with both the Docker CLI/DinD and the
prefect-docker
library. They just speak to the same daemon. How paths are resolved depends on the daemon: - Using host Docker via socket mount (Docker-outside-of-Docker) - Your worker container talks to the node’s Docker daemon through
/var/run/docker.sock
. - In any volume spec
host_path:container_path
, the left side is a path on the node. - The worker’s own filesystem is invisible unless a node path is mounted into it and then used as the host side for volumes. - Best to use named volumes or absolute node paths and mount them consistently into flow-run containers and any sibling containers you spin up from a task. - Using DinD (Docker-in-Docker) - Your worker talks to a Docker daemon running inside a container (usually
docker:dind
). - In volume specs, the left side is a path inside the DinD container’s filesystem, not the node. - If you want files to persist or be shared, you must mount host paths or named volumes into the DinD container first, then use those paths on the left side. - You’ll typically set
DOCKER_HOST=<tcp://dind:2375>
(or similar) for the worker. DinD usually requires
--privileged
. - Networking/persistence are isolated inside the DinD daemon unless you expose/mount accordingly. prefect-docker vs Docker CLI - No difference in path semantics.
prefect_docker.containers.create_docker_container(...)
and
docker run
both submit a create request to whatever daemon `DOCKER_HOST`/socket points to. The daemon decides what the “host” path is. Entrypoints / Dockerfiles - Not passed at runtime. They’re baked into the image you build during
prefect deploy
. The worker just pulls and runs the image. Two minimal patterns 1) Host socket (recommended for simplicity) - Worker container
Copy code
docker run --rm -it \
  -e PREFECT_API_URL=... -e PREFECT_API_KEY=... \
  -v /var/run/docker.sock:/var/run/docker.sock \
  -v ~/.prefect:/root/.prefect \
  prefecthq/prefect:3-latest \
  prefect worker start --pool my-docker-pool
- Deployment job variables (share data and allow nested Docker)
Copy code
job_variables:
  image: <http://ghcr.io/you/prefect-app:latest|ghcr.io/you/prefect-app:latest>
  volumes:
    - "shared-data:/mnt/shared:rw"
    - "/var/run/docker.sock:/var/run/docker.sock"
- Task spawning a sibling container uses the same named volume:
Copy code
create_docker_container(
  image="alpine:3.20",
  command=["sh","-c","ls -la /mnt/shared && echo done"],
  volumes=["shared-data:/mnt/shared:rw"],
  detach=True,
)
2) DinD (if you need isolation) - docker-compose style
Copy code
services:
  dind:
    image: docker:dind
    privileged: true
    environment:
      - DOCKER_TLS_CERTDIR=
    volumes:
      - dind-lib:/var/lib/docker
      - workspace:/workspace         # shared in-DinD path
  worker:
    image: prefecthq/prefect:3-latest
    environment:
      - PREFECT_API_URL=...
      - PREFECT_API_KEY=...
      - DOCKER_HOST=<tcp://dind:2375>  # talk to DinD daemon
    depends_on: [dind]
    command: prefect worker start --pool my-docker-pool
    volumes:
      - workspace:/workspace         # optional: so your code can write into a path DinD can see
volumes:
  dind-lib:
  workspace:
- Deployment job variables (left side is inside the DinD container!)
Copy code
job_variables:
  image: <http://ghcr.io/you/prefect-app:latest|ghcr.io/you/prefect-app:latest>
  volumes:
    - "/workspace/shared:/mnt/shared:rw"
Checklist to avoid path chaos - Identify the daemon: - Host socket: left side is the node’s filesystem. - DinD: left side is the DinD container’s filesystem. - Share data: - Prefer named volumes. - If you must use bind mounts, use absolute paths and mount them consistently in both the flow-run and any sibling containers. - Need nested Docker from flow runs? - Mount the Docker socket where appropriate (host socket or DinD TCP/socket) into the flow-run container via deployment
volumes
. Docs
- Docker worker how-to: Prefect 3 Docker worker If you tell me which daemon you’re targeting (host socket vs DinD) and what you need to share, I’ll map your exact volume specs so they resolve correctly. Otherwise, I’ll just keep guessing, which I’m told is “fun.”
d
@Marvin would it just be cleaner to just take the built docker image as standalone and run it instead in the flow? i.e. skip the volume mounting, assume all deps and entrypoint scripts and configs are already done in the container during the build process? such that in the flow we either pull the image, or use the image currently found in the environment and run it?
m
thought for 52.9 seconds
Short answer: yes—if you can avoid sharing files at runtime, baking everything into the image and just running it is the cleanest path. No bind mounts, no path headaches. You’ve got two clean variants: A) Make the prebuilt image your flow-run image (preferred) - Build an image that already contains your flow code, entrypoint, dependencies, configs, scripts—everything. - Your deployment points at that image; the Docker worker just pulls and runs it. - No nested container needed. - Control pulling with
image_pull_policy
(IfNotPresent/Always/Never). Example `prefect.yaml`:
Copy code
deployments:
  - name: run-image-directly
    entrypoint: flows/my_flow.py:my_flow  # included in the image
    work_pool:
      name: my-docker-pool
    job_variables:
      image: "<http://ghcr.io/you/your-flow-image:2025-08-15|ghcr.io/you/your-flow-image:2025-08-15>"
      env:
        APP_MODE: prod
      image_pull_policy: IfNotPresent
      auto_remove: true
B) Keep a minimal controller flow and run a sidecar container (no volumes) - Your flow just invokes a prebuilt image via the Docker daemon. - Pass inputs via env vars or CLI args; send outputs over the network (API, S3/GCS, DB, queue) or read logs. Without volumes, you can’t write files to the host. - Works the same whether you use
prefect-docker
or the Docker CLI; both talk to whichever Docker daemon you configured. Minimal task (no volumes) using prefect-docker:
Copy code
from prefect import flow, task
from prefect_docker.containers import create_docker_container, start_docker_container, get_docker_container_logs, remove_docker_container
from prefect_docker.images import pull_docker_image

@task
def run_image(image: str, command=None, env: dict | None = None, pre_pull: bool = True) -> str:
    if pre_pull:
        pull_docker_image(image)  # optional but explicit

    c = create_docker_container(
        image=image,
        command=command,                 # e.g. ["sh", "-c", "my-app --flag"]
        environment=env or {},
        detach=True,
        auto_remove=False,               # we’ll clean up
    )
    try:
        start_docker_container(c.id)
        logs = get_docker_container_logs(c.id, follow=True)
        return logs
    finally:
        try:
            remove_docker_container(c.id, force=True)
        except Exception:
            pass

@flow
def controller():
    logs = run_image(
        image="<http://ghcr.io/you/my-job:stable|ghcr.io/you/my-job:stable>",
        command=None,         # use image entrypoint
        env={"CONFIG_MODE": "prod"}
    )
    return logs
When to choose which - Option A (flow-run image) if the container you want to run is, effectively, “the flow.” It’s the canonical Prefect pattern and simplest operationally. - Option B (sidecar) if your flow orchestrates multiple independent tools/images, and you’re happy to pass data via network APIs or logs (no local file sharing). Notes and gotchas - Without volumes, any dynamic inputs/outputs must be handled via env vars/args and remote storage/services. You cannot “smuggle” files in or out. - If the image must pull secrets/configs at runtime, inject them as env vars via deployment
job_variables.env
or Prefect blocks—not files. - Pull behavior: - For flow-run images, set
job_variables.image_pull_policy
. - For sidecars, either call
pull_docker_image(image)
first or let Docker pull on first run. If you share whether your target image already contains the flow code (Option A) or is a separate tool you want to call (Option B), I’ll pin down the exact
prefect.yaml
and task code. Otherwise, I’ll assume the cleanest thing and be only mildly disappointed.
d
@Marvin for option A how does prefect deal with input and output files? considering that this will be either a task in a flow (e.g task 1 (non-docker), task 2 (non-docker), task 3 (docker)), or a subflow in the main flow (i.e task 3 but running as a subflow that is in a docker container).
m
thought for 106.6 seconds
Great question. Two key truths first: - A deployment defines infrastructure for the entire flow run. You can’t have “task 1 non-Docker, task 3 Docker” within the same flow run. - To run just “task 3” in Docker while earlier tasks don’t, you model task 3 as its own flow with its own Docker deployment and call it from the parent (subflow via deployment). Or you spawn a sidecar container from a task (Option B), which you want to avoid here. So with Option A (prebuilt image as the flow-run image), file inputs/outputs don’t magically traverse environments. Use one of these patterns: Cleanest patterns for inputs/outputs 1) Remote object storage (recommended) - Parent flow (non-Docker tasks) writes inputs to S3/GCS/Azure; passes object URIs to the Dockerized child flow via parameters. - Child flow reads the URIs, does its work, writes outputs to storage, returns the output URIs (or just logs them). - No volumes. No path mismatches. Works across any infra boundary. 2) API/DB handoff - Pass identifiers in params; child reads/writes via an API or database. Same idea as storage, different medium. 3) Shared filesystem (only if you must) - If parent runs on a VM and child runs in Docker on the same host: - Use a host bind mount path, e.g., /srv/shared, mounted into the child’s container as /srv/shared. - Parent writes to /srv/shared on the host; child sees it at /srv/shared in the container. - Named volumes don’t help the host process; they’re managed by Docker and not directly readable on the host without extra gymnastics. What this looks like in code Parent flow calling a Dockerized child deployment (subflow) - Parent runs wherever (e.g., process worker). Child is its own deployment on a Docker work pool. - Pass storage URIs as parameters. Do not expect a return value to carry a large payload; use storage and return URIs instead.
Copy code
from prefect import flow
from prefect.deployments import run_deployment

@flow
def parent_flow():
    # 1) Produce input and upload to storage
    input_uri = "<s3://my-bucket/input/data.parquet>"  # write this in a prior task

    # 2) Kick off the Dockerized child as a subflow (tracked in UI)
    run_deployment(
        name="docker-child/my-deployment",
        parameters={"input_uri": input_uri},
        as_subflow=True,
    )

    # 3) The child writes outputs to storage and returns an output URI via state or logs
    # You can pass an expected output URI or convention:
    output_uri = "<s3://my-bucket/output/result.parquet>"
    # Read it in a subsequent task if needed
Child flow (Docker deployment) reading from storage and writing back - Bake all deps, entrypoints, and configs into the image. - Use env vars or blocks for credentials; write outputs back to storage.
Copy code
from prefect import flow, task

@task
def process(input_uri: str) -> str:
    # download input_uri -> process -> upload output_uri
    # return output URI
    output_uri = input_uri.replace("/input/", "/output/")  # for illustration only
    return output_uri

@flow
def docker_child_flow(input_uri: str) -> str:
    return process(input_uri)
Child deployment (Docker) in prefect.yaml - Points at your prebuilt image. No volumes needed.
Copy code
deployments:
  - name: my-deployment
    entrypoint: docker_child_flow.py:docker_child_flow
    work_pool:
      name: my-docker-pool
    job_variables:
      image: "<http://ghcr.io/you/child-image:2025-08-15|ghcr.io/you/child-image:2025-08-15>"
      image_pull_policy: IfNotPresent
      auto_remove: true
      env:
        MODE: prod
If you insist on files without storage - Parent must write to a host path, e.g., /srv/shared. - Mount the same host path into the child container via job_variables.volumes: -
/srv/shared:/srv/shared:rw
- Be mindful of which daemon you’re using: - Host socket: left side is the host path. - DinD: left side is the DinD container’s path, so you must mount /srv/shared into the DinD container first. Otherwise, enjoy the abyss. FAQs
- Can a subflow (called as a function) run in a different container? No. If you call a @flow function directly, it inherits the parent’s infra. To change infra, deploy it separately and call it with run_deployment. - Can I pass large binary results directly between parent and child? No, not safely or portably. Use object storage and pass URIs. - How do I return something meaningful? Return small metadata (URIs, IDs, counts). Store the heavy bits externally. Useful links - Docker worker guide: Docker worker in Prefect 3 - Running deployments from flows:
run_deployment
lets you spawn that Dockerized child: available via
prefect.deployments
(we verified the signature). If you tell me which storage you prefer (S3/GCS/Azure/NFS) and where your parent runs (VM? container?), I’ll give you the exact mounts or SDK snippets. I enjoy eliminating path bugs almost as much as I enjoy existential dread.
d
@Marvin i am currently using local dev environment with multiple docker containers for this, e.g. postgres, redis, mlflow, prefect-server, prefect-worker, prefect-services and my own api. i want to avoid s3/gcs/azure for now. what is the best approach for my current setup that can scale to s3 later on?
m
thought for 96.6 seconds
Yes — for your local stack, the cleanest approach that scales to S3 later is: - Keep using Option A (prebuilt image as the flow-run image). - Share data locally via a single named Docker volume mounted at a consistent path (e.g., /mnt/shared). - In code, read/write using URI-style paths via fsspec. Today use file:/// URIs. Later switch to s3:// with no code changes (just install s3fs and change the base URI). This avoids host/DinD path headaches, keeps everything inside Docker, and gives you a straight shot to S3 later. Why this works well - Named volume: simple, portable across your containers; no bind-mount absolute path juggling. - fsspec: one IO API for both local and S3. You return URIs between tasks/flows, not file bytes. - Migration plan: change file:///mnt/shared/... to s3://bucket/... and install s3fs. Done. Compose: declare a named volume and mount it into services that need local files
Copy code
services:
  prefect-server:
    image: prefecthq/prefect:3-latest
    # ...
  prefect-worker:
    image: prefecthq/prefect:3-latest
    command: prefect worker start --pool my-docker-pool
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock  # talk to host Docker daemon

  my-api:
    image: your/api:latest
    volumes:
      - shared-data:/mnt/shared

  mlflow:
    image: <http://ghcr.io/mlflow/mlflow|ghcr.io/mlflow/mlflow>
    volumes:
      - shared-data:/mnt/shared

volumes:
  shared-data:
Prefect deployment: mount the same volume into flow-run containers
Copy code
deployments:
  - name: dockerized-flow
    entrypoint: flows/dockerized_flow.py:my_flow
    work_pool:
      name: my-docker-pool
    job_variables:
      image: <http://ghcr.io/you/your-flow-image:dev|ghcr.io/you/your-flow-image:dev>
      volumes:
        - "shared-data:/mnt/shared:rw"
      auto_remove: true
      image_pull_policy: IfNotPresent
Use URI-based IO with fsspec (works now with file://, later with s3://)
Copy code
# pip install fsspec   # later: pip install s3fs

import os
import fsspec
from urllib.parse import urljoin

STORE_BASE = os.environ.get("STORE_BASE", "file:///mnt/shared/")  # later: <s3://your-bucket/>

def uri(*parts: str) -> str:
    u = STORE_BASE
    for p in parts:
        u = urljoin(u if u.endswith("/") else u + "/", p)
    return u

def write_text(u: str, text: str) -> str:
    with fsspec.open(u, "w") as f:
        f.write(text)
    return u

def read_text(u: str) -> str:
    with fsspec.open(u, "r") as f:
        return f.read()
Parent/child flow pattern (all inside Docker, or parent calls a separate Dockerized subflow) - Inside one flow-run (all tasks in the same container), just read/write file:///mnt/shared/… as needed. - If splitting into a separate Dockerized subflow, pass only URIs between them.
Copy code
from prefect import flow, task
from prefect.deployments import run_deployment

@task
def produce_input() -> str:
    u = uri("inputs", "data.txt")
    return write_text(u, "hello world")

@flow
def parent_flow():
    input_uri = produce_input()
    # Option A: next task in same container
    result_uri = process_data(input_uri)
    # Option B: call Dockerized child deployment (different container)
    run_deployment(
        name="dockerized-flow/my-deployment",
        parameters={"input_uri": input_uri},
        as_subflow=True,
    )

@task
def process_data(input_uri: str) -> str:
    text = read_text(input_uri)
    out_uri = uri("outputs", "result.txt")
    return write_text(out_uri, text.upper())
Later migration to S3 - Set
STORE_BASE=<s3://your-bucket/prefix/>
-
pip install s3fs
- Ensure AWS creds in env/IRSA/etc. - Remove the volume from deployments if you no longer need local files. Notes - Prefer host Docker socket over DinD for simplicity. - Keep paths consistent: always use /mnt/shared inside containers. - Only pass lightweight metadata/URIs between tasks/flows; keep heavy data in the “store.” Docs - Docker worker guide: Prefect 3 Docker worker
- fsspec: fsspec docs If you share which services need to see the same files (API, MLflow, etc.), I’ll map exact mounts/URIs. Otherwise I’ll assume “shared-data:/mnt/shared” everywhere and quietly hope for the best.
d
@Marvin do i need to pre-create a separate deployemnt with docker worker instead of my original deployment for non docker? such that when i call run_deployment it references the docker deployment instead of the original one? will the subflow show correctly in the ui?
m
thought for 53.9 seconds
Short answers: - Yes: create a separate deployment that targets a Docker work pool for the part you want to run in Docker. Your “non-Docker” deployment remains as-is. - Call that Docker deployment from your parent with
run_deployment(...)
. - Yes: if you set
as_subflow=True
(the default), the child run will appear nested under the parent in the UI with proper links. Why - A deployment defines the infra for the entire flow run. To switch infra mid-workflow, you invoke a separate deployment. - Calling a flow function directly keeps the parent’s infra. Calling
run_deployment
executes it under the child deployment’s infra (Docker). Minimal pattern 1) Two deployments (can be the same flow code or a different flow): - Parent (non-Docker), e.g. process worker or your local run - Child (Docker), on a Docker work pool with its image prefect.yaml (excerpt)
Copy code
deployments:
  - name: parent-non-docker
    entrypoint: flows/parent.py:parent_flow
    work_pool:
      name: my-process-pool   # or however you run the parent
    job_variables: {}

  - name: child-docker
    entrypoint: flows/child.py:child_flow
    work_pool:
      name: my-docker-pool
    job_variables:
      image: "<http://ghcr.io/you/child-image:dev|ghcr.io/you/child-image:dev>"
      image_pull_policy: IfNotPresent
      auto_remove: true
      # For local file sharing with a non-docker parent, mount a host path:
      # volumes:
      #   - "/srv/shared:/srv/shared:rw"
2) Parent calls the Dockerized child
Copy code
from prefect import flow
from prefect.deployments import run_deployment

@flow
def parent_flow():
    # do non-docker things
    # then call docker child
    fr = run_deployment(
        name="child-docker",           # or "child-flow/child-docker"
        parameters={"input_uri": "file:///srv/shared/inp.txt"},
        as_subflow=True,               # default True, nests nicely in UI
        # Optional per-run overrides:
        # job_variables={"image": "<http://ghcr.io/you/child-image:abc123|ghcr.io/you/child-image:abc123>"}
    )
    return fr
UI behavior - The child run shows nested under the parent run when
as_subflow=True
(default). You’ll see linked runs and state propagation in the timeline. Local file handoff today, S3 later - If your parent is truly non-Docker (runs on the host), and the child is Docker: - Use a host bind path (e.g.,
/srv/shared
) that the parent writes to directly, and mount that same host path into the Docker deployment via
job_variables.volumes: ["/srv/shared:/srv/shared:rw"]
. - If both parent and child run in Docker on the same node: - Prefer a named volume (e.g.,
shared-data:/mnt/shared
) mounted into both containers. - To migrate to S3 later, switch to URI-based IO via fsspec now (use
file:///srv/shared/...
today; later flip to
<s3://bucket/>...
and install
s3fs
). Keep passing URIs between parent and child. CLI you’ll actually use - Create a Docker work pool:
prefect work-pool create my-docker-pool --type docker
- Start a worker:
prefect worker start --pool my-docker-pool
- Deploy:
prefect deploy
Docs - Docker worker how-to: Docker worker in Prefect 3 - Running deployments from flows:
prefect.deployments.run_deployment
(default
as_subflow=True
) If you confirm whether your parent runs as a process on the host or in a container, I’ll give you the exact volume/URI setup to avoid any “where did my file go?” moments.
d
@Marvin will this make sense in my docker-compose to create the pools and workers? prefect-worker: build: context: . dockerfile: docker/prefect_worker/Dockerfile depends_on: - prefect-server environment: - PREFECT_API_URL=http://prefect-server:4200/api - PYTHONPATH=/home/worker/src volumes: - ./src:/home/worker/src - prefect_data:/opt/prefect - shared_data:/mnt/shared - /var/run/docker.sock:/var/run/docker.sock working_dir: /home/worker command: > bash -c " prefect work-pool create "local-docker-pool" --type docker && prefect work-pool create "local-pool" --type process && prefect worker start --pool local-pool && prefect worker start --pool local-docker-pool " or should there be another worker service for docker?
m
thought for 81.2 seconds
Short version: run two separate worker services. Don’t try to start both workers in one container. And if you use a named volume for shared data, make it an external volume so the Docker worker’s containers can see the same volume name. Why your current command won’t behave -
prefect worker start
blocks. With
&&
, the first worker will run forever and the second will never start. - You could background one with
&
, but then you’re in init/supervisor land. It’s brittle. Use two services. Recommended docker-compose shape - One service for the process worker - One service for the Docker worker - Optional one-time “init” service to create/update work pools Example
Copy code
services:
  prefect-init:
    image: prefecthq/prefect:3-latest
    depends_on:
      - prefect-server
    environment:
      - PREFECT_API_URL=<http://prefect-server:4200/api>
    command: >
      bash -lc "
      prefect work-pool create 'local-pool' --type process || true &&
      prefect work-pool create 'local-docker-pool' --type docker || true
      "
    restart: "no"

  prefect-worker-process:
    image: prefecthq/prefect:3-latest
    depends_on:
      - prefect-server
      - prefect-init
    environment:
      - PREFECT_API_URL=<http://prefect-server:4200/api>
      - PYTHONPATH=/home/worker/src
    working_dir: /home/worker
    volumes:
      - ./src:/home/worker/src
      - prefect_data:/opt/prefect
      - shared_data:/mnt/shared     # optional; only if your process tasks need it
    command: >
      bash -lc "prefect worker start --pool local-pool"

  prefect-worker-docker:
    image: prefecthq/prefect:3-latest
    depends_on:
      - prefect-server
      - prefect-init
    environment:
      - PREFECT_API_URL=<http://prefect-server:4200/api>
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - prefect_data:/opt/prefect
      # Do NOT rely on this container’s mounts for flow-run volumes; those are set in deployment job_variables.
    command: >
      bash -lc "prefect worker start --pool local-docker-pool"

volumes:
  prefect_data:
  # Use an external named volume so Docker worker-launched containers can mount it by the same name
  shared_data:
    external: true
    name: shared_data
Important notes - External named volume: If you want to mount a named volume into flow-run containers via deployment
job_variables: volumes: ["shared_data:/mnt/shared"]
, the volume must exist globally with that exact name. Compose otherwise prefixes volume names (e.g., project_shared_data), and your worker-launched containers won’t find “shared_data”. Create it once:
Copy code
docker volume create shared_data
Then declare it as
external: true
with
name: shared_data
in compose. - Alternative: bind mount a host path instead of a named volume (avoids name scoping issues): - In compose services:
- /srv/shared:/mnt/shared
- In Docker deployments’ job_variables:
- "/srv/shared:/mnt/shared"
- The left side must be a real host path that the Docker daemon can see. - The Docker worker’s own container mounts do not automatically propagate to flow-run containers. Volumes for flow-run containers are defined in the deployment’s
job_variables.volumes
. - Pool creation: - Better as a one-time init job (as shown) rather than on every worker start. -
|| true
keeps it idempotent if the pool already exists. - Your
PREFECT_API_URL
for Server is fine (`http://prefect-server:4200/api`) inside the Compose network. - UI: When the parent calls
run_deployment(..., as_subflow=True)
for the Dockerized child, you’ll see the child run nested under the parent in the UI. If you want, I can sketch the matching
prefect.yaml
for both your process and docker deployments, including the correct
job_variables.volumes
for either the external named volume or the host bind path. I promise only a minimal amount of sardonic commentary.