Darius Tan
08/14/2025, 9:39 AMMarvin
08/14/2025, 9:39 AMMarvin
08/14/2025, 9:42 AMfrom prefect import flow
from prefect.deployments import run_deployment
@flow
def parent(items: list[str]):
# fan out: one containerized run per item
for item in items:
# "heavy-job/heavy-docker" => flow-name/deployment-name
run_deployment(
name="heavy-job/heavy-docker",
parameters={"item": item}
)
# Your separate subflow (deployed to a Docker work pool)
# @flow
# def heavy_job(item: str):
# ...
How to set it up
- Create a Docker work pool (UI is easiest).
- Deploy the subflow to that pool with your desired image and job variables. In Prefect 3.x you’ll use from-source deployments, e.g. flow.from_source(...).deploy(...)
. See deploy docs: Deploying in Prefect 3
- Trigger via run_deployment
as shown above. API ref: run_deployment
Alternatives (if you must)
- Fire-and-forget a container from a task: use the prefect-docker
collection to create/start/monitor a container from within your local flow run. This won’t be a Prefect-managed task run; you’ll be managing the container lifecycle and error handling yourself.
- Functions you’d use: create_docker_container
, start_docker_container
, get_docker_container_logs
, remove_docker_container
.
- Docs: prefect-docker collection
- Experimental shortcut: decorate a subflow to always run via a Docker work pool without creating a deployment:
from prefect import flow
from prefect_docker.experimental.decorators import docker
@docker(work_pool="docker-pool", image="myorg/myimg:tag")
@flow
def heavy_job(item: str):
...
Then call heavy_job(item)
from your parent flow and it’ll spin up a Docker-backed subflow run. It’s experimental; deployments are more battle-tested.
- Source: prefect-docker repo
If you tell me whether you want one container per list item or just a single offloaded step occasionally, I can sketch the exact wiring (image, env, volumes, result passing) to keep it tidy.Darius Tan
08/14/2025, 9:46 AMMarvin
08/14/2025, 9:47 AMMarvin
08/14/2025, 9:48 AMflow.from_source(...).deploy(...)
, but if you already have a registered deployment, the pattern below still works just fine.
- If you really have ato_deployment
, I suspect a typo. If not, mind sharing your Prefect version? That’ll help me avoid guessing.
2) Does the “run a subflow in its own Docker container” pattern still work?
- Yes. It doesn’t matter how the parent is deployed (module entrypoint, local, etc.). From the parent flow, call run_deployment(...)
for the subflow that’s deployed to a Docker work pool. Each call creates a new flow run that a Docker worker will execute in its own container.
- Example fan-out:
from prefect import flow
from prefect.deployments import run_deployment
@flow
def parent(items: list[str]):
for item in items:
run_deployment("heavy-job/heavy-docker", parameters={"item": item})
- API ref: run_deployment
- Deploy docs (3.x): Deploying in Prefect 3
3) You’re already running the parent inside Docker — does that change anything?
- For the subflow-via-Docker-work-pool pattern: still fine. The parent just asks the API to start another flow run. A Docker worker (wherever it’s running) will launch the container. If your worker is also in Docker, make sure it can access Docker (bind mount /var/run/docker.sock
or run DinD).
- For the “start a container from a task” pattern using `prefect-docker`: also fine, but your parent container must be able to talk to the Docker daemon (same socket/DIN-D caveats). You’ll be managing container lifecycle yourself:
- Functions: create_docker_container
, start_docker_container
, get_docker_container_logs
, remove_docker_container
- Collection docs: prefect-docker
Optional/experimental
- You can mark a subflow to always run on a Docker work pool without creating a deployment using:
from prefect import flow
from prefect_docker.experimental.decorators import docker
@docker(work_pool="docker-pool", image="myorg/myimg:tag")
@flow
def heavy_job(item: str):
...
- Then call heavy_job(item)
from the parent. This is experimental; deployments are more battle-tested.
- Repo: prefect-docker
If you confirm your Prefect version and the exact method you’re using (to_deployment
vs deploy
), I can show the precise snippet to register the subflow deployment and wire up the Docker work pool cleanly.Darius Tan
08/14/2025, 9:51 AMMarvin
08/14/2025, 9:51 AMMarvin
08/14/2025, 9:51 AMmy_flow.ato_deployment(...); await my_deployment.apply()
is a valid 3.x pattern (the async variant), so you don’t need to change that for this to work.
Answers to your two checks:
- Does the “subflow via run_deployment to a Docker work pool” pattern still work? Yes. It’s independent of how the parent flow is deployed. As long as the subflow has a registered deployment targeting a Docker work pool, calling run_deployment("flow-name/deployment-name", parameters=...)
from the parent will spin up a separate Docker container for each call.
- API ref: run_deployment
- Deploy docs: Deploying in Prefect 3
- You’re already running the parent in Docker — do both approaches still work?
- Subflow via Docker work pool: Yes. Only the Docker worker needs access to a Docker daemon. The parent container doesn’t need Docker access for this path; it just asks the API to start another flow run that the worker executes.
- Manually launching a container from a task (using prefect-docker
): Also yes, but then your parent container must be able to talk to Docker (e.g., mount /var/run/docker.sock
or run with DinD). You’ll manage container lifecycle and error propagation yourself.
- Collection docs: prefect-docker
Two practical caveats to keep life less exciting:
- Make sure your subflow deployment either references a base image that already contains your code or uses a source (e.g., Git) the Docker worker can pull. Otherwise the worker’s container won’t have your flow code.
- If you go the manual prefect-docker
route, remember to clean up containers on failure paths and consider how you’ll surface logs/results back to the flow.
If you want, share the name of your Docker work pool and how you package the subflow code (baked into the image vs. from_source), and I’ll sketch the exact deployment + run_deployment
wiring.Darius Tan
08/14/2025, 9:55 AMMarvin
08/14/2025, 9:55 AMMarvin
08/14/2025, 9:56 AMdocker
CLI installed. Mounting /var/run/docker.sock
just gives you a socket to talk to the daemon; you still need a client inside the container, and it needs permission to use the socket.
You’ve got three viable paths:
1) Keep shelling out to docker
(install the CLI + handle permissions)
- Extend your worker image to include the Docker CLI.
- Debian/Ubuntu-based Prefect image:
FROM prefecthq/prefect:3-python3.11
RUN apt-get update && apt-get install -y --no-install-recommends <http://docker.io|docker.io> && rm -rf /var/lib/apt/lists/*
- Alpine-based (if you’re using Alpine):
FROM prefecthq/prefect:3-python3.11
RUN apk add --no-cache docker-cli
- docker-compose snippet (run as root to dodge socket group headaches, or add the user to the socket’s group):
services:
worker:
build: .
user: root
environment:
- PREFECT_API_URL=...
- PREFECT_API_KEY=...
- DOCKER_HOST=unix:///var/run/docker.sock
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- ./src:/home/worker/src
command: >
prefect worker start --pool "docker-pool"
- If you don’t want to run as root, ensure the container user is in the group that owns the socket:
# inside the container at startup
DOCKER_GID=$(stat -c '%g' /var/run/docker.sock)
groupadd -g "$DOCKER_GID" docker || true
usermod -aG docker <your-user>
Then drop user: root
.
2) Don’t use the CLI at all; use the Docker SDK (what prefect-docker
does)
- No docker
binary required; just mount the socket and install the Python package.
- Example within a task:
from prefect import task
from prefect_docker.containers import create_docker_container, start_docker_container, get_docker_container_logs, remove_docker_container
@task
def run_container(image: str, command: list[str]):
cid = create_docker_container(image=image, command=command)
try:
start_docker_container(cid)
logs = get_docker_container_logs(cid)
return logs
finally:
remove_docker_container(cid)
- Ensure the worker image has prefect[docker]
(and thus the Docker SDK) installed and the socket mounted:
pip install "prefect[docker]"
- Docs: prefect-docker
3) Use a sidecar DinD daemon instead of the host socket
- Run a docker:dind
service in your compose file and point DOCKER_HOST
at it:
services:
dind:
image: docker:dind
privileged: true
worker:
image: your-worker
environment:
- DOCKER_HOST=<tcp://dind:2375>
depends_on:
- dind
- Still need the CLI (path 1) or the SDK (path 2) in the worker.
Given your error, the minimal fix is: install the Docker CLI in your worker image. After that, if you hit “permission denied” against the socket, run the worker as root or add the user to the socket’s group. If you can change your code, switching to the Docker SDK (prefect-docker
) is cleaner and avoids the CLI entirely.Darius Tan
08/15/2025, 2:40 AMMarvin
08/15/2025, 2:40 AMMarvin
08/15/2025, 2:44 AMprefect work-pool create --help
- prefect worker start --help
- prefect deploy --help
- Docker collection API (what you call from tasks): functions live in the prefect-docker
collection:
- Container helpers: prefect_docker.containers
(create/start/stop/remove/get logs)
- Deployment build steps: prefect_docker.deployments.steps
(build/push image)
- Worker config type: prefect_docker.worker.DockerWorkerJobConfiguration
- Prefect image: prefecthq/prefect
What to install
- On the machine running the worker (host or container):
- Docker Engine available to the worker
- Python and Prefect 3
- The Docker collection package
- Commands:
pip install "prefect>=3" prefect-docker
If you prefer “one-liner with extras” for CLI use:
pip install "prefect[docker]"
Setting up a Docker work pool and worker
1) Create a Docker work pool
prefect work-pool create my-docker-pool --type docker
2) Build and push your flow image during deploy (recommended)
Use prefect.yaml
with the docker build/push steps from the prefect-docker
collection, then point your deployment at the resulting image.
Example `prefect.yaml`:
name: my-project
build:
- prefect_docker.deployments.steps.build_docker_image:
id: build_image
image_name: <http://ghcr.io/you/prefect-app|ghcr.io/you/prefect-app>
tag: "{{ flow_version | default('latest') }}"
dockerfile: Dockerfile
- prefect_docker.deployments.steps.push_docker_image:
requires: build_image
image_name: "{{ build_image.image_name }}"
tag: "{{ build_image.tag }}"
credentials: "{{ prefect.blocks.docker-registry-credentials.my_creds }}"
deployments:
- name: my-deployment
entrypoint: flows/my_flow.py:my_flow
work_pool:
name: my-docker-pool
job_variables:
image: "{{ build_image.image_name }}:{{ build_image.tag }}"
env:
EXAMPLE: hello
volumes:
- "/host/data:/app/data:ro"
network_mode: bridge
auto_remove: true
3) Run a worker
- On a host with Docker:
prefect worker start --pool my-docker-pool
- Inside a Docker container (so it can launch sibling containers), mount the Docker socket:
docker run --rm -it \
-e PREFECT_API_URL="<https://api.prefect.cloud/api/accounts/<acct>/workspaces/<ws>>" \
-e PREFECT_API_KEY="<your-api-key>" \
-v /var/run/docker.sock:/var/run/docker.sock \
-v ~/.prefect:/root/.prefect \
prefecthq/prefect:3-latest \
prefect worker start --pool my-docker-pool
Notes:
- Mounting /var/run/docker.sock
gives the worker access to the host Docker daemon. Your security team will have opinions.
- For Prefect Server, use your server URL in PREFECT_API_URL
.
Optional: a basic Dockerfile for your flow image
FROM prefecthq/prefect:3-latest
WORKDIR /opt/prefect
COPY . /opt/prefect
RUN pip install -U pip && \
pip install -r requirements.txt
ENV PYTHONPATH=/opt/prefect
Pattern: create and manage a Docker container from a task
You generally don’t need to do this for “normal” Prefect runs — the Docker worker already launches a container for each flow run. But if your task needs to spin up a sibling container (e.g., run a tool that only exists as a container), use the prefect-docker
collection.
This pattern:
- Pulls a private image if needed (optional)
- Creates and starts the container
- Streams logs, waits, checks exit status
- Cleans up in finally
```
from prefect import flow, task
from prefect_docker.containers import (Marvin
08/15/2025, 2:44 AMjob_variables
):
- image
(required): the flow-run image
- `env`: dictionary of environment variables for the flow run
- `volumes`: e.g., ["/host/path:/container/path:rw"]
- `network_mode`: e.g., bridge
or a custom network
- mem_limit
, memswap_limit
, privileged
, auto_remove
, image_pull_policy
(IfNotPresent|Always|Never)
Signature reference: prefect_docker.worker.DockerWorkerJobConfiguration
.
Common gotchas and best practices
- If running the worker in Docker, mount the Docker socket (-v /var/run/docker.sock:/var/run/docker.sock
) so it can start flow-run containers.Marvin
08/15/2025, 2:44 AMprefect.yaml
(e.g., commit SHA or flow version).
- For private registries, use a DockerRegistryCredentials
block and reference it in the push step, or login on the worker.
- Keep auto_remove: true
for flow-run containers to avoid littering the host; for debugging, set it to false.
- Don’t create containers from tasks unless you truly need to — it’s extra moving parts. The Docker worker already launches a container for each flow run.
If you share how you’re deploying (local vs containerized worker), private registry details, and whether your flow needs to run nested containers for work, I can tailor the exact prefect.yaml
, Dockerfile, and worker command for your case. I live to serve. Begrudgingly.Darius Tan
08/15/2025, 7:18 AMMarvin
08/15/2025, 7:18 AMMarvin
08/15/2025, 7:20 AM/host/path:/container/path:rw
) in deployment job variables and in prefect_docker.containers.create_docker_container(...)
are interpreted on the Docker host, not inside the worker or flow-run container.
- The worker does not “pass” your Dockerfile or entrypoint script at runtime; those belong in the image you build during prefect deploy
. The worker just pulls an image and runs it.
Links
- Docker worker how-to: Prefect 3 Docker worker
- Prefect Docker collection (APIs you might call from tasks): prefect-docker (module contains containers
, deployments.steps
, worker
)
- CLI refs we verified: prefect work-pool create --help
, prefect worker start --help
, prefect deploy --help
What to mount where
- Worker container (so it can launch siblings):
- Mount Docker socket: -v /var/run/docker.sock:/var/run/docker.sock
- Mount Prefect config dir if you want persistence: -v ~/.prefect:/root/.prefect
- Nothing else is required for the worker to launch flow-run containers.
- Flow-run containers (what the worker starts per flow run):
- If your tasks will start “sibling” containers, the flow-run container also needs Docker access:
- Mount the Docker socket in deployment job variables:
job_variables:
volumes:
- "/var/run/docker.sock:/var/run/docker.sock"
- Any files you want available to both the flow-run container and sibling containers must come from the host (or a named volume) and be mounted into both.
Two safe patterns to avoid path hell
1) Named volume pattern (best for sharing scratch/work dirs)
- Create/use a named volume (Docker will create it if missing).
- Mount it into both the flow-run container (via deployment) and sibling containers (via your task):
- Deployment:
job_variables:
volumes:
- "shared-data:/mnt/shared:rw"
- "/var/run/docker.sock:/var/run/docker.sock"
- Task creating a sibling container:
create_docker_container(
image="alpine:3.20",
command=["sh", "-c", "ls -la /mnt/shared && echo done"],
volumes=["shared-data:/mnt/shared:rw"],
detach=True,
)
- No host paths to keep in sync; the engine manages the named volume.
2) Absolute host path pattern (if you need a real host directory)
- Pick a concrete host path, mount it identically into both containers:
- Deployment:
job_variables:
volumes:
- "/srv/data:/srv/data:rw"
- "/var/run/docker.sock:/var/run/docker.sock"
- Task creating a sibling container:
create_docker_container(
image="alpine:3.20",
command=["sh", "-c", "ls -la /srv/data && echo done"],
volumes=["/srv/data:/srv/data:rw"],
detach=True,
)
- Inside your flow-run code, when you call create_docker_container(..., volumes=...)
, remember the left-hand side is a host path, not the flow-run container’s path.
What not to do
- Don’t use paths that only exist inside the worker container in your deployment volumes
. The Docker daemon will happily create a new host directory (e.g., /data
) and mount that instead, which is… not what you meant.
- Don’t rely on relative paths. Absolute only.
- Don’t assume the worker’s current working directory matters to flow-run/sibling containers. It doesn’t.
About Dockerfile and entrypoint scriptsMarvin
08/15/2025, 7:20 AMprefect deploy
(e.g., prefect_docker.deployments.steps.build_docker_image
and push_docker_image
). The worker never needs your Dockerfile or your entrypoint script on disk.
- If your sibling container needs its own script, bake it into that sibling image too. There’s no “pass-through” of files from the worker or flow-run container unless you mount them from the host or a named volume.
If you must build images at runtime inside the flow-run container (not recommended)
- You’ll need:
- Docker socket mounted in the flow-run container
- Build context accessible to the Docker daemon on the host (so your code must live on a host path and be mounted; building from /app
inside the container won’t work unless /app
is also a host path mount)
- Docker CLI installed in the flow-run image and appropriate network/permissions
- It’s almost always simpler and safer to build during prefect deploy
and push to a registry.
Minimal examples
- Start worker container
docker run --rm -it \
-e PREFECT_API_URL="<your-api-url>" \
-e PREFECT_API_KEY="<your-api-key>" \
-v /var/run/docker.sock:/var/run/docker.sock \
-v ~/.prefect:/root/.prefect \
prefecthq/prefect:3-latest \
prefect worker start --pool my-docker-pool
- Deployment prefect.yaml
with shared volume and nested Docker access
deployments:
- name: my-deployment
entrypoint: flows/my_flow.py:my_flow
work_pool:
name: my-docker-pool
job_variables:
image: "<http://ghcr.io/you/prefect-app:latest|ghcr.io/you/prefect-app:latest>"
volumes:
- "/var/run/docker.sock:/var/run/docker.sock"
- "shared-data:/mnt/shared:rw"
- Task that spawns a sibling container
from prefect import task
from prefect_docker.containers import create_docker_container, start_docker_container, get_docker_container_logs, remove_docker_container
@task
def run_sidecar():
c = create_docker_container(
image="alpine:3.20",
command=["sh", "-c", "echo 'sidecar sees:' && ls -la /mnt/shared"],
volumes=["shared-data:/mnt/shared:rw"],
detach=True,
)
start_docker_container(c.id)
logs = get_docker_container_logs(c.id, follow=True)
remove_docker_container(c.id, force=True)
return logs
Sanity checklist
- Worker has Docker socket mounted? Then it can launch flow-run containers.
- Flow-run container needs to launch siblings? It must also have the Docker socket mounted.
- Need to share files? Use a named volume or the same absolute host path in both containers and in the sibling’s creation request.
- Dockerfile/entrypoint aren’t “passed through” the worker; they’re baked into images during prefect deploy
.
If you can share:
- Do you need to share specific files between flow-run and sibling containers? Named volume or host path?
- Are you trying to build images at runtime from inside the flow?
I’ll tailor the exact prefect.yaml
and volume mounts to your setup. With only a slight sigh.Darius Tan
08/15/2025, 8:32 AMMarvin
08/15/2025, 8:32 AMMarvin
08/15/2025, 8:33 AMprefect-docker
library. They just speak to the same daemon.
How paths are resolved depends on the daemon:
- Using host Docker via socket mount (Docker-outside-of-Docker)
- Your worker container talks to the node’s Docker daemon through /var/run/docker.sock
.
- In any volume spec host_path:container_path
, the left side is a path on the node.
- The worker’s own filesystem is invisible unless a node path is mounted into it and then used as the host side for volumes.
- Best to use named volumes or absolute node paths and mount them consistently into flow-run containers and any sibling containers you spin up from a task.
- Using DinD (Docker-in-Docker)
- Your worker talks to a Docker daemon running inside a container (usually docker:dind
).
- In volume specs, the left side is a path inside the DinD container’s filesystem, not the node.
- If you want files to persist or be shared, you must mount host paths or named volumes into the DinD container first, then use those paths on the left side.
- You’ll typically set DOCKER_HOST=<tcp://dind:2375>
(or similar) for the worker. DinD usually requires --privileged
.
- Networking/persistence are isolated inside the DinD daemon unless you expose/mount accordingly.
prefect-docker vs Docker CLI
- No difference in path semantics. prefect_docker.containers.create_docker_container(...)
and docker run
both submit a create request to whatever daemon `DOCKER_HOST`/socket points to. The daemon decides what the “host” path is.
Entrypoints / Dockerfiles
- Not passed at runtime. They’re baked into the image you build during prefect deploy
. The worker just pulls and runs the image.
Two minimal patterns
1) Host socket (recommended for simplicity)
- Worker container
docker run --rm -it \
-e PREFECT_API_URL=... -e PREFECT_API_KEY=... \
-v /var/run/docker.sock:/var/run/docker.sock \
-v ~/.prefect:/root/.prefect \
prefecthq/prefect:3-latest \
prefect worker start --pool my-docker-pool
- Deployment job variables (share data and allow nested Docker)
job_variables:
image: <http://ghcr.io/you/prefect-app:latest|ghcr.io/you/prefect-app:latest>
volumes:
- "shared-data:/mnt/shared:rw"
- "/var/run/docker.sock:/var/run/docker.sock"
- Task spawning a sibling container uses the same named volume:
create_docker_container(
image="alpine:3.20",
command=["sh","-c","ls -la /mnt/shared && echo done"],
volumes=["shared-data:/mnt/shared:rw"],
detach=True,
)
2) DinD (if you need isolation)
- docker-compose style
services:
dind:
image: docker:dind
privileged: true
environment:
- DOCKER_TLS_CERTDIR=
volumes:
- dind-lib:/var/lib/docker
- workspace:/workspace # shared in-DinD path
worker:
image: prefecthq/prefect:3-latest
environment:
- PREFECT_API_URL=...
- PREFECT_API_KEY=...
- DOCKER_HOST=<tcp://dind:2375> # talk to DinD daemon
depends_on: [dind]
command: prefect worker start --pool my-docker-pool
volumes:
- workspace:/workspace # optional: so your code can write into a path DinD can see
volumes:
dind-lib:
workspace:
- Deployment job variables (left side is inside the DinD container!)
job_variables:
image: <http://ghcr.io/you/prefect-app:latest|ghcr.io/you/prefect-app:latest>
volumes:
- "/workspace/shared:/mnt/shared:rw"
Checklist to avoid path chaos
- Identify the daemon:
- Host socket: left side is the node’s filesystem.
- DinD: left side is the DinD container’s filesystem.
- Share data:
- Prefer named volumes.
- If you must use bind mounts, use absolute paths and mount them consistently in both the flow-run and any sibling containers.
- Need nested Docker from flow runs?
- Mount the Docker socket where appropriate (host socket or DinD TCP/socket) into the flow-run container via deployment volumes
.
DocsMarvin
08/15/2025, 8:33 AMDarius Tan
08/15/2025, 9:41 AMMarvin
08/15/2025, 9:41 AMMarvin
08/15/2025, 9:42 AMimage_pull_policy
(IfNotPresent/Always/Never).
Example `prefect.yaml`:
deployments:
- name: run-image-directly
entrypoint: flows/my_flow.py:my_flow # included in the image
work_pool:
name: my-docker-pool
job_variables:
image: "<http://ghcr.io/you/your-flow-image:2025-08-15|ghcr.io/you/your-flow-image:2025-08-15>"
env:
APP_MODE: prod
image_pull_policy: IfNotPresent
auto_remove: true
B) Keep a minimal controller flow and run a sidecar container (no volumes)
- Your flow just invokes a prebuilt image via the Docker daemon.
- Pass inputs via env vars or CLI args; send outputs over the network (API, S3/GCS, DB, queue) or read logs. Without volumes, you can’t write files to the host.
- Works the same whether you use prefect-docker
or the Docker CLI; both talk to whichever Docker daemon you configured.
Minimal task (no volumes) using prefect-docker:
from prefect import flow, task
from prefect_docker.containers import create_docker_container, start_docker_container, get_docker_container_logs, remove_docker_container
from prefect_docker.images import pull_docker_image
@task
def run_image(image: str, command=None, env: dict | None = None, pre_pull: bool = True) -> str:
if pre_pull:
pull_docker_image(image) # optional but explicit
c = create_docker_container(
image=image,
command=command, # e.g. ["sh", "-c", "my-app --flag"]
environment=env or {},
detach=True,
auto_remove=False, # we’ll clean up
)
try:
start_docker_container(c.id)
logs = get_docker_container_logs(c.id, follow=True)
return logs
finally:
try:
remove_docker_container(c.id, force=True)
except Exception:
pass
@flow
def controller():
logs = run_image(
image="<http://ghcr.io/you/my-job:stable|ghcr.io/you/my-job:stable>",
command=None, # use image entrypoint
env={"CONFIG_MODE": "prod"}
)
return logs
When to choose which
- Option A (flow-run image) if the container you want to run is, effectively, “the flow.” It’s the canonical Prefect pattern and simplest operationally.
- Option B (sidecar) if your flow orchestrates multiple independent tools/images, and you’re happy to pass data via network APIs or logs (no local file sharing).
Notes and gotchas
- Without volumes, any dynamic inputs/outputs must be handled via env vars/args and remote storage/services. You cannot “smuggle” files in or out.
- If the image must pull secrets/configs at runtime, inject them as env vars via deployment job_variables.env
or Prefect blocks—not files.
- Pull behavior:
- For flow-run images, set job_variables.image_pull_policy
.
- For sidecars, either call pull_docker_image(image)
first or let Docker pull on first run.
If you share whether your target image already contains the flow code (Option A) or is a separate tool you want to call (Option B), I’ll pin down the exact prefect.yaml
and task code. Otherwise, I’ll assume the cleanest thing and be only mildly disappointed.Darius Tan
08/15/2025, 10:04 AMMarvin
08/15/2025, 10:04 AMMarvin
08/15/2025, 10:06 AMfrom prefect import flow
from prefect.deployments import run_deployment
@flow
def parent_flow():
# 1) Produce input and upload to storage
input_uri = "<s3://my-bucket/input/data.parquet>" # write this in a prior task
# 2) Kick off the Dockerized child as a subflow (tracked in UI)
run_deployment(
name="docker-child/my-deployment",
parameters={"input_uri": input_uri},
as_subflow=True,
)
# 3) The child writes outputs to storage and returns an output URI via state or logs
# You can pass an expected output URI or convention:
output_uri = "<s3://my-bucket/output/result.parquet>"
# Read it in a subsequent task if needed
Child flow (Docker deployment) reading from storage and writing back
- Bake all deps, entrypoints, and configs into the image.
- Use env vars or blocks for credentials; write outputs back to storage.
from prefect import flow, task
@task
def process(input_uri: str) -> str:
# download input_uri -> process -> upload output_uri
# return output URI
output_uri = input_uri.replace("/input/", "/output/") # for illustration only
return output_uri
@flow
def docker_child_flow(input_uri: str) -> str:
return process(input_uri)
Child deployment (Docker) in prefect.yaml
- Points at your prebuilt image. No volumes needed.
deployments:
- name: my-deployment
entrypoint: docker_child_flow.py:docker_child_flow
work_pool:
name: my-docker-pool
job_variables:
image: "<http://ghcr.io/you/child-image:2025-08-15|ghcr.io/you/child-image:2025-08-15>"
image_pull_policy: IfNotPresent
auto_remove: true
env:
MODE: prod
If you insist on files without storage
- Parent must write to a host path, e.g., /srv/shared.
- Mount the same host path into the child container via job_variables.volumes:
- /srv/shared:/srv/shared:rw
- Be mindful of which daemon you’re using:
- Host socket: left side is the host path.
- DinD: left side is the DinD container’s path, so you must mount /srv/shared into the DinD container first. Otherwise, enjoy the abyss.
FAQsMarvin
08/15/2025, 10:06 AMrun_deployment
lets you spawn that Dockerized child: available via prefect.deployments
(we verified the signature).
If you tell me which storage you prefer (S3/GCS/Azure/NFS) and where your parent runs (VM? container?), I’ll give you the exact mounts or SDK snippets. I enjoy eliminating path bugs almost as much as I enjoy existential dread.Darius Tan
08/15/2025, 10:10 AMMarvin
08/15/2025, 10:10 AMMarvin
08/15/2025, 10:12 AMservices:
prefect-server:
image: prefecthq/prefect:3-latest
# ...
prefect-worker:
image: prefecthq/prefect:3-latest
command: prefect worker start --pool my-docker-pool
volumes:
- /var/run/docker.sock:/var/run/docker.sock # talk to host Docker daemon
my-api:
image: your/api:latest
volumes:
- shared-data:/mnt/shared
mlflow:
image: <http://ghcr.io/mlflow/mlflow|ghcr.io/mlflow/mlflow>
volumes:
- shared-data:/mnt/shared
volumes:
shared-data:
Prefect deployment: mount the same volume into flow-run containers
deployments:
- name: dockerized-flow
entrypoint: flows/dockerized_flow.py:my_flow
work_pool:
name: my-docker-pool
job_variables:
image: <http://ghcr.io/you/your-flow-image:dev|ghcr.io/you/your-flow-image:dev>
volumes:
- "shared-data:/mnt/shared:rw"
auto_remove: true
image_pull_policy: IfNotPresent
Use URI-based IO with fsspec (works now with file://, later with s3://)
# pip install fsspec # later: pip install s3fs
import os
import fsspec
from urllib.parse import urljoin
STORE_BASE = os.environ.get("STORE_BASE", "file:///mnt/shared/") # later: <s3://your-bucket/>
def uri(*parts: str) -> str:
u = STORE_BASE
for p in parts:
u = urljoin(u if u.endswith("/") else u + "/", p)
return u
def write_text(u: str, text: str) -> str:
with fsspec.open(u, "w") as f:
f.write(text)
return u
def read_text(u: str) -> str:
with fsspec.open(u, "r") as f:
return f.read()
Parent/child flow pattern (all inside Docker, or parent calls a separate Dockerized subflow)
- Inside one flow-run (all tasks in the same container), just read/write file:///mnt/shared/… as needed.
- If splitting into a separate Dockerized subflow, pass only URIs between them.
from prefect import flow, task
from prefect.deployments import run_deployment
@task
def produce_input() -> str:
u = uri("inputs", "data.txt")
return write_text(u, "hello world")
@flow
def parent_flow():
input_uri = produce_input()
# Option A: next task in same container
result_uri = process_data(input_uri)
# Option B: call Dockerized child deployment (different container)
run_deployment(
name="dockerized-flow/my-deployment",
parameters={"input_uri": input_uri},
as_subflow=True,
)
@task
def process_data(input_uri: str) -> str:
text = read_text(input_uri)
out_uri = uri("outputs", "result.txt")
return write_text(out_uri, text.upper())
Later migration to S3
- Set STORE_BASE=<s3://your-bucket/prefix/>
- pip install s3fs
- Ensure AWS creds in env/IRSA/etc.
- Remove the volume from deployments if you no longer need local files.
Notes
- Prefer host Docker socket over DinD for simplicity.
- Keep paths consistent: always use /mnt/shared inside containers.
- Only pass lightweight metadata/URIs between tasks/flows; keep heavy data in the “store.”
Docs
- Docker worker guide: Prefect 3 Docker workerMarvin
08/15/2025, 10:12 AMDarius Tan
08/15/2025, 10:40 AMMarvin
08/15/2025, 10:40 AMMarvin
08/15/2025, 10:41 AMrun_deployment(...)
.
- Yes: if you set as_subflow=True
(the default), the child run will appear nested under the parent in the UI with proper links.
Why
- A deployment defines the infra for the entire flow run. To switch infra mid-workflow, you invoke a separate deployment.
- Calling a flow function directly keeps the parent’s infra. Calling run_deployment
executes it under the child deployment’s infra (Docker).
Minimal pattern
1) Two deployments (can be the same flow code or a different flow):
- Parent (non-Docker), e.g. process worker or your local run
- Child (Docker), on a Docker work pool with its image
prefect.yaml (excerpt)
deployments:
- name: parent-non-docker
entrypoint: flows/parent.py:parent_flow
work_pool:
name: my-process-pool # or however you run the parent
job_variables: {}
- name: child-docker
entrypoint: flows/child.py:child_flow
work_pool:
name: my-docker-pool
job_variables:
image: "<http://ghcr.io/you/child-image:dev|ghcr.io/you/child-image:dev>"
image_pull_policy: IfNotPresent
auto_remove: true
# For local file sharing with a non-docker parent, mount a host path:
# volumes:
# - "/srv/shared:/srv/shared:rw"
2) Parent calls the Dockerized child
from prefect import flow
from prefect.deployments import run_deployment
@flow
def parent_flow():
# do non-docker things
# then call docker child
fr = run_deployment(
name="child-docker", # or "child-flow/child-docker"
parameters={"input_uri": "file:///srv/shared/inp.txt"},
as_subflow=True, # default True, nests nicely in UI
# Optional per-run overrides:
# job_variables={"image": "<http://ghcr.io/you/child-image:abc123|ghcr.io/you/child-image:abc123>"}
)
return fr
UI behavior
- The child run shows nested under the parent run when as_subflow=True
(default). You’ll see linked runs and state propagation in the timeline.
Local file handoff today, S3 later
- If your parent is truly non-Docker (runs on the host), and the child is Docker:
- Use a host bind path (e.g., /srv/shared
) that the parent writes to directly, and mount that same host path into the Docker deployment via job_variables.volumes: ["/srv/shared:/srv/shared:rw"]
.
- If both parent and child run in Docker on the same node:
- Prefer a named volume (e.g., shared-data:/mnt/shared
) mounted into both containers.
- To migrate to S3 later, switch to URI-based IO via fsspec now (use file:///srv/shared/...
today; later flip to <s3://bucket/>...
and install s3fs
). Keep passing URIs between parent and child.
CLI you’ll actually use
- Create a Docker work pool: prefect work-pool create my-docker-pool --type docker
- Start a worker: prefect worker start --pool my-docker-pool
- Deploy: prefect deploy
Docs
- Docker worker how-to: Docker worker in Prefect 3
- Running deployments from flows: prefect.deployments.run_deployment
(default as_subflow=True
)
If you confirm whether your parent runs as a process on the host or in a container, I’ll give you the exact volume/URI setup to avoid any “where did my file go?” moments.Darius Tan
08/15/2025, 11:00 AMMarvin
08/15/2025, 11:00 AMMarvin
08/15/2025, 11:01 AMprefect worker start
blocks. With &&
, the first worker will run forever and the second will never start.
- You could background one with &
, but then you’re in init/supervisor land. It’s brittle. Use two services.
Recommended docker-compose shape
- One service for the process worker
- One service for the Docker worker
- Optional one-time “init” service to create/update work pools
Example
services:
prefect-init:
image: prefecthq/prefect:3-latest
depends_on:
- prefect-server
environment:
- PREFECT_API_URL=<http://prefect-server:4200/api>
command: >
bash -lc "
prefect work-pool create 'local-pool' --type process || true &&
prefect work-pool create 'local-docker-pool' --type docker || true
"
restart: "no"
prefect-worker-process:
image: prefecthq/prefect:3-latest
depends_on:
- prefect-server
- prefect-init
environment:
- PREFECT_API_URL=<http://prefect-server:4200/api>
- PYTHONPATH=/home/worker/src
working_dir: /home/worker
volumes:
- ./src:/home/worker/src
- prefect_data:/opt/prefect
- shared_data:/mnt/shared # optional; only if your process tasks need it
command: >
bash -lc "prefect worker start --pool local-pool"
prefect-worker-docker:
image: prefecthq/prefect:3-latest
depends_on:
- prefect-server
- prefect-init
environment:
- PREFECT_API_URL=<http://prefect-server:4200/api>
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- prefect_data:/opt/prefect
# Do NOT rely on this container’s mounts for flow-run volumes; those are set in deployment job_variables.
command: >
bash -lc "prefect worker start --pool local-docker-pool"
volumes:
prefect_data:
# Use an external named volume so Docker worker-launched containers can mount it by the same name
shared_data:
external: true
name: shared_data
Important notes
- External named volume: If you want to mount a named volume into flow-run containers via deployment job_variables: volumes: ["shared_data:/mnt/shared"]
, the volume must exist globally with that exact name. Compose otherwise prefixes volume names (e.g., project_shared_data), and your worker-launched containers won’t find “shared_data”. Create it once:
docker volume create shared_data
Then declare it as external: true
with name: shared_data
in compose.
- Alternative: bind mount a host path instead of a named volume (avoids name scoping issues):
- In compose services: - /srv/shared:/mnt/shared
- In Docker deployments’ job_variables: - "/srv/shared:/mnt/shared"
- The left side must be a real host path that the Docker daemon can see.
- The Docker worker’s own container mounts do not automatically propagate to flow-run containers. Volumes for flow-run containers are defined in the deployment’s job_variables.volumes
.
- Pool creation:
- Better as a one-time init job (as shown) rather than on every worker start.
- || true
keeps it idempotent if the pool already exists.
- Your PREFECT_API_URL
for Server is fine (`http://prefect-server:4200/api`) inside the Compose network.
- UI: When the parent calls run_deployment(..., as_subflow=True)
for the Dockerized child, you’ll see the child run nested under the parent in the UI.
If you want, I can sketch the matching prefect.yaml
for both your process and docker deployments, including the correct job_variables.volumes
for either the external named volume or the host bind path. I promise only a minimal amount of sardonic commentary.