<@ULVA73B9P> How can I set up my workers so that t...
# ask-marvin
j
@Marvin How can I set up my workers so that they always run with code from the latest commit on the main branch of my repository?
m
thought for 429.0 seconds
Great question. In Prefect 3.x, the easiest way to ensure workers always run the latest code from your repo’s main branch is to make your deployment pull your code at run time from Git. There are two common patterns: 1) Prefect project + pull steps (CLI) - Create a Prefect project with a pull step that clones your repo’s main branch on every run. - Example prefect.yaml:
Copy code
name: my-project

pull:
  - prefect.deployments.steps.pull.git_clone:
      repository: <https://github.com/acme/data-pipelines.git>
      branch: main
  - prefect.deployments.steps.utility.pip_install_requirements:
      requirements_file: requirements.txt

deployments:
  - name: prod
    entrypoint: flows/etl.py:etl
    work_pool:
      name: process-pool
- Then, from the project root run:
Copy code
prefect deploy
Notes - This clones fresh on each run, so you always get HEAD of main. - For private repos, pass an
access_token
or a credentials block to
git_clone
. - Docs: Pull steps (git_clone), Deployment versioning: pin to branch vs commit 2) Python API with Flow.from_source + GitRepository - Define your deployment to load from a GitRepository that tracks main:
Copy code
from prefect import flow
from prefect.storage import GitRepository

@flow
def etl():
    ...

repo = GitRepository(
    url="<https://github.com/acme/data-pipelines.git>",
    branch="main",          # track the main branch
    # commit_sha=None       # leave unset to follow the branch head
    # pull_interval=60      # seconds; cached clone refresh cadence
)

Flow.from_source(
    source=repo,
    entrypoint="flows/etl.py:etl",
).deploy(
    name="prod",
    work_pool_name="process-pool",
)
Notes - Prefect will maintain a cached clone and refresh it periodically; the
pull_interval
(default 60s) controls how often it checks for updates. If you require the freshest code every single run, either lower this interval aggressively or use the “pull step” approach above which clones each time. - For rollbacks, set
commit_sha
to a specific commit; to resume tracking latest, remove it and keep
branch="main"
. - Docs: Deployment versioning: pin to branch vs commit Caveats by worker type - Process workers - Ensure
git
is installed on the worker machine and it has network access to your repo. - For private repos, provide credentials via the pull step
access_token
or a Git credentials block. - Docker workers - If you use either pattern above, the clone happens inside the container at runtime. Make sure your base image has
git
installed and can reach your repo. - Example Dockerfile snippet:
Copy code
FROM prefecthq/prefect:3-python3.11
    RUN apt-get update && apt-get install -y git && rm -rf /var/lib/apt/lists/*
- Install your Python deps either with the
pip_install_requirements
pull step or bake them into the image. - Kubernetes workers - Same as Docker: ensure the runtime image has
git
and network access; use pull steps or GitRepository storage. If you bake code into images instead, you’ll need CI to build/push on each commit and ensure pods pull the new image. Which should you choose? - Want absolute “latest on every run” with no caching? Use the pull step approach; it will clone HEAD of main at the start of each run. - Want to reuse a cached clone and refresh frequently? Use
Flow.from_source
with
GitRepository(branch="main")
and tune
pull_interval
. If you share which worker type you’re using (process, Docker, Kubernetes) and whether your repo is private, I can tailor an exact config (including credentials handling and a minimal Dockerfile if needed).