<@ULVA73B9P> Is it possible to install python depe...
# ask-marvin
r
@Marvin Is it possible to install python dependencies during pull steps ? I have comme deps among flows but I have a flow where I must install dependencies dynamically so I cannot install those ones into a docker image. How can I handle this case ??
m
thought for 285.1 seconds
Short answer: “pull steps” were a Prefect 2.x feature. In Prefect 3.x there are no pull steps, but you can still install packages dynamically at run time. The best approach depends on where your flow runs (Docker, Kubernetes, or Process worker). A few solid patterns: - Docker worker: add a small bootstrap command to the container entrypoint/command that installs extras before the flow starts. For example, keep a requirements-extras.txt in your repo (fetched by
from_source
) and run
pip install -r requirements-extras.txt
before Prefect runs your flow. See Docker infra guidance: Docker deployments (Prefect 3.x docs) and workers overview: Workers (concepts). - Kubernetes worker: use an initContainer or a custom command on the main container to install the extras, then start the flow. See Kubernetes deployments (Prefect 3.x docs). A common pattern is: - Mount a writable volume (e.g., /opt/site-packages) - Init container:
pip install --target /opt/site-packages -r /workspace/extra-requirements.txt
- Main container: set
PYTHONPATH=/opt/site-packages
before starting the flow - Process worker (or anywhere): install at the top of the flow, then import. If you can’t bake the deps into the image, you can install to a local directory and add it to
sys.path
. This avoids needing to restart the interpreter. Example: install-on-the-fly inside your flow
Copy code
from prefect import flow, task, get_run_logger
import sys, subprocess, tempfile
from pathlib import Path

@task
def ensure_dynamic_deps(packages: list[str]) -> str:
    target = Path(tempfile.gettempdir()) / "prefect_extra_deps"
    target.mkdir(parents=True, exist_ok=True)
    cmd = [
        sys.executable, "-m", "pip", "install",
        "--no-input", "--no-cache-dir",
        "--target", str(target),
        *packages,
    ]
    subprocess.check_call(cmd)
    # make installed packages importable in this process
    if str(target) not in sys.path:
        sys.path.insert(0, str(target))
    return str(target)

@flow
def my_flow():
    # pin versions for reproducibility
    ensure_dynamic_deps(["somepkg==1.2.3", "otherpkg>=0.9"])
    import somepkg  # now available
    # ... rest of your flow logic ...

if __name__ == "__main__":
    my_flow()
Notes and tradeoffs: - Task runner: if you use a ProcessPoolTaskRunner, modifying
sys.path
in one process won’t affect others. Prefer the default or a ThreadPoolTaskRunner for this pattern, or run the install step in the flow function before any tasks that import those packages. - Speed/caching: consider using a persistent pip cache (mount
PIP_CACHE_DIR
in Docker/K8s), hosting wheels internally, or using uv/mamba for faster installs. - Security/reproducibility: pin versions and treat dynamic installs as a last resort. When possible, keep a base image with common deps and layer dynamic extras on top at runtime. - Custom worker: for fully standardized pre-start behavior, you can build a custom worker template that runs a pre-flight script to install extras before the flow process starts. See Developing a custom worker. If you’re actually on Prefect 2.x, you can still use pull steps to run shell commands like
pip install -r ...
as part of the deployment steps, but we recommend moving to 3.x. Reference for 2.x: Prefect 2.x deployment guide. Can you share: - Which Prefect version you’re on (2.x or 3.x)? - Your worker/infrastructure (Docker worker, Kubernetes worker, or Process worker)? - Whether you can add a small bootstrap command to your container entrypoint? With that, I can give you a concrete snippet (Docker/K8s job template or flow code) tailored to your setup.
r
@Marvin I'm using prefect 3.22.0 with a kubernetes worker. I can bundle my code and base packages within a docker image. For 1 of my 3 flows it will work out out the box but 2 of them need to install dynamically dependencies. I can retrieve the name of those dependencies in a prefect variables. Is it possible to install those dependencies before the flow start without having to put them into the docker image ?
m
thought for 101.5 seconds
Yes. In Prefect 3.x with a Kubernetes worker, you can install extra Python packages “just before” your flow starts without baking them into the image. The most robust pattern is to use an initContainer that installs deps into a shared volume, then have the main container include that path on PYTHONPATH. High-level approach - Add an emptyDir volume to the pod. - Add an initContainer that: - Reads the package list (e.g., from an env var) - Runs pip install into the shared volume path (e.g., /opt/pydeps) - Mount the same volume in the main container and set PYTHONPATH to include it. - Set the package list differently per deployment (e.g., from a Prefect Variable or a deployment/job variable). Why this works - The initContainer runs to completion before your flow container starts, so packages are in place when the flow starts. - You can keep your base image small and install only what you need at run time. - You can vary the deps per flow via deployment/job variables without code changes. Example pod spec fragment for your Kubernetes work pool base job template Add these fields to your K8s worker’s base job template (in the “pod spec” section). The exact UI/JSON shape depends on how you manage the template, but these are the Kubernetes-native fields you need:
Copy code
volumes:
  - name: pydeps
    emptyDir: {}

initContainers:
  - name: install-dynamic-deps
    image: YOUR_FLOW_IMAGE  # match Python/pip with the main container
    command: ["/bin/sh","-lc"]
    args:
      - |
        set -e
        if [ -n "$EXTRA_PY_DEPS" ]; then
          echo "Installing dynamic deps: $EXTRA_PY_DEPS"
          python - <<'PY'
import os, sys, subprocess
pkgs = os.environ.get("EXTRA_PY_DEPS","").split()
if pkgs:
    subprocess.check_call([sys.executable, "-m", "pip", "install",
                           "--no-cache-dir",
                           "--target", "/opt/pydeps", *pkgs])
PY
        else
          echo "No EXTRA_PY_DEPS set; skipping"
        fi
    env:
      - name: EXTRA_PY_DEPS
        value: ""  # override per-deployment via job variables
    volumeMounts:
      - name: pydeps
        mountPath: /opt/pydeps

containers:
  - name: prefect-job
    env:
      - name: PYTHONPATH
        value: /opt/pydeps:$(PYTHONPATH)
    volumeMounts:
      - name: pydeps
        mountPath: /opt/pydeps
How to pass the packages list per flow - Easiest: set an env var (EXTRA_PY_DEPS) in the deployment’s job variables so each flow’s deployment can specify different packages. For example: “pydantic==2.9.2 httpx<1.0 numpy==1.26.4”. - If you prefer Prefect Variables: - Option A (template at deploy-time): Reference the workspace variable in the deployment/job variables so it’s rendered into EXTRA_PY_DEPS. See variables and templating: Variables (Prefect 3.x docs). - Option B (fetch at run-time in initContainer): Change the initContainer command to fetch the variable directly:
Copy code
python - <<'PY'
    import os, subprocess, sys
    from prefect import variables
    pkgs = variables.get("YOUR_VARIABLE_NAME") or ""
    if pkgs.strip():
        subprocess.check_call([sys.executable, "-m", "pip", "install",
                               "--no-cache-dir", "--target", "/opt/pydeps", *pkgs.split()])
    PY
Note: this requires the pod’s initContainer to have Prefect and API env configured (the Kubernetes worker usually injects PREFECT_API_URL and auth env into the pod; if not, add them to the initContainer env). Tips - Use your own package index/cache for speed (PIP_INDEX_URL, PIP_EXTRA_INDEX_URL). You can add these as env vars in the job template and mount a pip cache volume if desired. - Keep versions pinned in the variable to ensure reproducibility. - Use the same base image for init and main container to avoid Python/pip ABI mismatches. Docs
- Kubernetes deployments: Kubernetes deployments (Prefect 3.x docs) - Workers overview: Workers (concepts) - Variables: Variables (Prefect 3.x docs) - Custom worker templates (for deeper customization): Developing a custom worker If you’d like, share your current Kubernetes work pool base job template (or how you manage it in the UI), and the variable names you plan to use. I can produce a ready-to-paste version tailored to your setup and show exactly where to put the env and initContainer blocks.
r
@Marvin is it something common to install dynamically dependencies using an initContainer, or most of the time people do it in another way ? What about changing the
command
of the image to first install dynamically deps and then run the flow ?
m
thought for 34.4 seconds
Great question. Short answer: yes, both patterns are used. In Kubernetes, using an initContainer with a shared volume is a common, K8s-native way to install dynamic deps before the flow starts. Overriding the container command/entrypoint to run a small bootstrap script is also common and simpler to apply in some setups. Here’s how to think about them. What most teams do - Preferred when you can bake everything: build a per-flow image with all deps. Fastest and most reproducible. - When a subset must be dynamic: initContainer + emptyDir volume + PYTHONPATH is a clean, widely used pattern in K8s. It guarantees packages are installed before your flow container starts and avoids writing to the main image filesystem. - Also seen: a wrapper entrypoint script in your image that installs extras and then execs the original command. This keeps one container and is easy to roll out, but you must avoid clobbering Prefect’s generated command. InitContainer approach (recommended for K8s) Pros: - Strict ordering: initContainer completes before your flow starts. - Doesn’t require root in the main container; install to a mounted volume (e.g., /opt/pydeps) and add to PYTHONPATH. - Keeps your main image immutable. Cons: - Slightly more YAML/job-template work. - If you read Prefect Variables in the initContainer, make sure it has Prefect API env configured. Command/entrypoint wrapper approach Pros: - Simple to roll out; no extra container. - Can read Prefect Variables at runtime in the same container. - Easiest if you ship just a small wrapper script in your base image. Cons: - If you try to install into system site-packages, you may need root; prefer --target or a user dir. - You must not override Prefect’s command incorrectly. Best practice is an ENTRYPOINT wrapper that execs the original command so whatever Prefect sets still runs. Safe entrypoint wrapper example - Add this entrypoint to your base image. It installs extras to /opt/pydeps when EXTRA_PY_DEPS is set, then execs the command Prefect provides.
Copy code
#!/bin/sh
set -e

if [ -n "$EXTRA_PY_DEPS" ]; then
  echo "Installing dynamic deps: $EXTRA_PY_DEPS"
  python - <<'PY'
import os, sys, subprocess
pkgs = os.environ.get("EXTRA_PY_DEPS","").split()
if pkgs:
    subprocess.check_call([sys.executable, "-m", "pip", "install",
                           "--no-cache-dir",
                           "--target", "/opt/pydeps", *pkgs])
PY
  export PYTHONPATH="/opt/pydeps:${PYTHONPATH}"
fi

exec "$@"
Dockerfile:
Copy code
COPY entrypoint.sh /entrypoint.sh
RUN chmod +x /entrypoint.sh && mkdir -p /opt/pydeps
ENTRYPOINT ["/entrypoint.sh"]
# do not hardcode CMD; let Prefect set it per run
InitContainer pattern (K8s-native) - Use this when you prefer strict separation and no changes to the main image command:
Copy code
volumes:
  - name: pydeps
    emptyDir: {}

initContainers:
  - name: install-dynamic-deps
    image: YOUR_FLOW_IMAGE
    command: ["/bin/sh","-lc"]
    args:
      - |
        set -e
        if [ -n "$EXTRA_PY_DEPS" ]; then
          echo "Installing dynamic deps: $EXTRA_PY_DEPS"
          python - <<'PY'
import os, sys, subprocess
pkgs = os.environ.get("EXTRA_PY_DEPS","").split()
if pkgs:
    subprocess.check_call([sys.executable, "-m", "pip", "install",
                           "--no-cache-dir",
                           "--target", "/opt/pydeps", *pkgs])
PY
        else
          echo "No EXTRA_PY_DEPS set; skipping"
        fi
    env:
      - name: EXTRA_PY_DEPS
        value: ""  # set per deployment via job variables or template
    volumeMounts:
      - name: pydeps
        mountPath: /opt/pydeps

containers:
  - name: prefect-job
    env:
      - name: PYTHONPATH
        value: /opt/pydeps:$(PYTHONPATH)
    volumeMounts:
      - name: pydeps
        mountPath: /opt/pydeps
Feeding the package list
- From Prefect Variables: set a deployment/job var (EXTRA_PY_DEPS) to the value of your Prefect Variable at deploy time, or have the script call
from prefect import variables; variables.get("YOUR_VAR")
and install those packages. Docs: Variables (Prefect 3.x docs) - Pin versions for reproducibility. For speed, consider internal wheels or a cache (PIP_INDEX_URL, PIP_EXTRA_INDEX_URL, PIP_CACHE_DIR). Avoid - Kubernetes postStart lifecycle hooks for this purpose; they run concurrently with app start and are racey. - Modifying system site-packages in a read-only/non-root container. Use
--target
or a user directory + PYTHONPATH. Docs - K8s deployments: Kubernetes deployments (Prefect 3.x docs) - Workers: Workers (concepts) - Custom worker templates: Developing a custom worker If you share how you manage your K8s work pool base job template (UI or as JSON/YAML) and the variable name holding the package list, I can provide an exact snippet you can paste in.