<@ULVA73B9P> Is this the writhe way to write a dep...
# marvin-ai
a
@Marvin Is this the writhe way to write a deployment, if I want all my deployments in the same script and they should be reading from GitHub. Also how do I specify a branch?
Copy code
from prefect import flow

github_url = "<https://github.com/path/path.git>"
work_pool_one = "my-work-pool"
ref="master"

if __name__ == "__main__":
    
    deployment_name = "my-github-deployment_one"
    entrypoint = "pipelines/pipeline_one.py:my_flow"
    flow.from_source(
        source=github_url,
        entrypoint=entrypoint,
    ).deploy(
        name=deployment_name,
        work_pool_name=work_pool_one,
    )
    
    deployment_name = "my-github-deployment_two"
    entrypoint = "pipelines/pipeline_two.py:my_flow"
    flow.from_source(
        source=github_url,
        entrypoint=entrypoint,
    ).deploy(
        name=deployment_name,
        work_pool_name=work_pool_one,
    )
m
thought for 264.0 seconds
Short answer: yes — in Prefect 3.x you can define multiple deployments in a single script by calling
flow.from_source(...).deploy(...)
multiple times. Your
entrypoint
strings are correct for flows living under subdirectories. To target a specific Git branch (or a specific commit), pass a
GitRepository
to
source
and set
branch
(or
commit_sha
). Here’s a cleaned-up version of your script that reuses the same Git source and sets the branch:
Copy code
from prefect import flow
from prefect.storage import GitRepository

GITHUB_URL = "<https://github.com/path/path.git>"
WORK_POOL = "my-work-pool"
BRANCH = "master"  # or "main", or use commit_sha="abc123"

if __name__ == "__main__":
    source = GitRepository(url=GITHUB_URL, branch=BRANCH)

    flow.from_source(
        source=source,
        entrypoint="pipelines/pipeline_one.py:my_flow",
    ).deploy(
        name="my-github-deployment_one",
        work_pool_name=WORK_POOL,
    )

    flow.from_source(
        source=source,
        entrypoint="pipelines/pipeline_two.py:my_flow",
    ).deploy(
        name="my-github-deployment_two",
        work_pool_name=WORK_POOL,
    )
Notes: -
entrypoint
should be
path/to/file.py:flow_function_name
relative to the repo root. - If you need reproducibility, prefer pinning
commit_sha
instead of a branch. - For private repos, pass credentials via
GitRepository(credentials=...)
using a
GitCredentials
block. If you want, I can show a compact loop-based version that generates many deployments from a list. Do you need to target a private repo or pin to a commit?
a
Ok. I’m just targeting a branch. And it’s a public repo for now.
@Marvin Ok. I’m just targetting a branch. And it’s a public repo for now. Also I get an error No module named ‘prefect.storage’
m
thought for 39.2 seconds
The pattern you’re using (multiple
flow.from_source(...).deploy(...)
calls in one script) is correct. Your
ref
variable isn’t used, though. The error
No module named 'prefect.storage'
means your environment is running Prefect 2.x (or older).
prefect.storage
and
GitRepository
are Prefect 3.x features. How to fix: - Check your version:
Copy code
python -c "import prefect; print(prefect.__version__)"
- Upgrade to Prefect 3.x (then restart your interpreter/worker):
Copy code
pip install -U "prefect>=3.0.0"
# or conda-forge:
# conda install -c conda-forge prefect
Once you’re on 3.x, here’s a concise script for a public repo on a specific branch:
Copy code
from prefect import flow
from prefect.storage import GitRepository

GITHUB_URL = "<https://github.com/path/path.git>"
WORK_POOL = "my-work-pool"
BRANCH = "master"  # or "main"

if __name__ == "__main__":
    source = GitRepository(url=GITHUB_URL, branch=BRANCH)

    flow.from_source(
        source=source,
        entrypoint="pipelines/pipeline_one.py:my_flow",
    ).deploy(
        name="my-github-deployment_one",
        work_pool_name=WORK_POOL,
    )

    flow.from_source(
        source=source,
        entrypoint="pipelines/pipeline_two.py:my_flow",
    ).deploy(
        name="my-github-deployment_two",
        work_pool_name=WORK_POOL,
    )
Notes: -
entrypoint
is
relative/path/to/file.py:flow_function_name
from the repo root. - If you later need reproducibility, use
commit_sha="..."
instead of
branch=...
. If you can’t upgrade right now and must stay on Prefect 2.x, say so and I’ll outline the 2.x approach, but I do recommend moving to 3.x.
a
@Marvin it’s prefect.runner.storage. That works. Can you show me how to do this “I can show a compact loop-based version that generates many deployments from a list.“. Also, can you explain what the development lifecycle might be for something like this? Like I want to have developers only working on development code (presumably a variable can be passed to each script to signify that it should run as a dev pipeline) . However, when merged into Github, it should be deployed to a different VM where a prod variable is passed into each script instead (which in turn is used to run it as a prod pipeline). Or something like that.
m
thought for 57.1 seconds
Great — yes,
prefect.runner.storage
is the right import in 3.x for
GitRepository
. Here’s a compact, loop-based pattern that generates many deployments from a list. It also shows how to pass an
env
parameter to each deployment so your flows can switch behavior for dev vs prod. Example: single script, deploy multiple flows for a given environment
Copy code
from prefect import flow
from prefect.runner.storage import GitRepository

REPO_URL = "<https://github.com/org/repo.git>"

# Each item: (entrypoint, short_name)
ENTRYPOINTS = [
    ("pipelines/pipeline_one.py:my_flow", "pipeline-one"),
    ("pipelines/pipeline_two.py:my_flow", "pipeline-two"),
    # add as many as you like...
]

def deploy_all(env: str, branch: str, work_pool: str, commit_sha: str | None = None):
    source = GitRepository(url=REPO_URL, branch=branch, commit_sha=commit_sha)
    for entrypoint, short_name in ENTRYPOINTS:
        flow.from_source(source=source, entrypoint=entrypoint).deploy(
            name=f"{short_name}-{env}",
            work_pool_name=work_pool,
            parameters={"env": env},   # flow reads this to switch dev/prod paths
            tags=[env],                # optional: helps filtering in UI
        )

if __name__ == "__main__":
    # Example: run one of these per environment (typically from CI)

    # Dev deployments (dev branch -> dev work pool)
    deploy_all(env="dev", branch="develop", work_pool="dev-pool")

    # Prod deployments (main branch -> prod work pool, optionally pin a commit for reproducibility)
    # deploy_all(env="prod", branch="main", work_pool="prod-pool", commit_sha=None)
And your flow(s) can accept the
env
parameter to toggle behavior:
Copy code
from prefect import flow, get_run_logger

@flow
def my_flow(env: str = "dev"):
    logger = get_run_logger()
    <http://logger.info|logger.info>(f"Running in env={env}")
    if env == "prod":
        # connect to prod DB/bucket/etc.
        ...
    else:
        # connect to dev/sandbox
        ...
Recommended development lifecycle - Source control and branches - Dev deployments point to a dev branch (e.g.,
develop
) and use
env="dev"
. - Prod deployments point to
main
(or pin
commit_sha
for reproducibility) and use
env="prod"
. - Infra isolation via work pools - Run a Prefect worker on your dev VM in a “dev” work pool (e.g.,
dev-pool
). - Run another worker on your prod VM in a “prod” work pool (e.g.,
prod-pool
). - Your deployments target the right VM by setting
work_pool_name
accordingly. - Parameters vs environment variables - Use deployment
parameters
(like
{"env": "dev"}
) for behavior flags. - Use environment variables/blocks for secrets and connection strings that differ per environment. - You can also tag deployments (e.g.,
["dev"]
,
["prod"]
) to help filter in the UI. - CI/CD flow - On push to the dev branch, your CI runs the script once for dev: -
deploy_all(env="dev", branch="develop", work_pool="dev-pool")
- On merge to main, your CI runs the script for prod: -
deploy_all(env="prod", branch="main", work_pool="prod-pool", commit_sha=<optional pin>)
- Ensure CI has
PREFECT_API_URL
and
PREFECT_API_KEY
set for the correct workspace. - Only run the one block relevant to the environment in a given CI job to avoid creating both dev/prod at the same time. - Optional hardening for prod - Pin
commit_sha
in prod deployments for deterministic code. - Add schedules, concurrency limits, and alerts to prod deployments. - Keep dev schedules paused or ad-hoc only. Would you like me to tailor the script to your exact branch names and work pool names, or show a GitHub Actions example that runs this per-branch?
a
@Marvin Does it make sense to just have one branch? When pulling code locally, running the code will run dev. But in github when deployment script run by github actions, it deploys to prod with a prod variable? whether this is the right approach, or your approach, can you explain how the CI actually works and “runs the script for dev” etc?
m
thought for 89.4 seconds
Short answer - Yes, a single-branch strategy can work: default to “dev” behavior locally; in CI set an env var to “prod” and deploy to prod. - Many teams still prefer two branches (develop/main) for clearer gating and safer review, but it’s a tradeoff. - CI “runs the script for dev/prod” by exporting environment variables (e.g.,
DEPLOY_ENV
,
WORK_POOL
) and then executing the same deployment script. Workers on different VMs listen on different work pools, so the same deployments go to the right place. A compact, loop-based deployment script (single-branch friendly) - Uses one code path, switches by
DEPLOY_ENV
- Pins prod deployments to the commit that triggered CI (reproducible) - Targets separate work pools for dev vs prod
Copy code
# deploy.py
import os
from prefect import flow
from prefect.runner.storage import GitRepository

REPO_URL = "<https://github.com/org/repo.git>"

# (entrypoint, short_name)
ENTRYPOINTS = [
    ("pipelines/pipeline_one.py:my_flow", "pipeline-one"),
    ("pipelines/pipeline_two.py:my_flow", "pipeline-two"),
]

def deploy_all(env: str, branch: str, work_pool: str, commit_sha: str | None = None):
    source = GitRepository(url=REPO_URL, branch=branch, commit_sha=commit_sha)
    for entrypoint, short_name in ENTRYPOINTS:
        flow.from_source(source=source, entrypoint=entrypoint).deploy(
            name=f"{short_name}-{env}",
            work_pool_name=work_pool,
            parameters={"env": env},
            tags=[env],
        )

if __name__ == "__main__":
    # Defaults make local runs "dev"
    env = os.getenv("DEPLOY_ENV", "dev")            # dev or prod
    branch = os.getenv("DEPLOY_BRANCH", "main")     # single-branch default
    # Use the build's commit for prod pinning (GitHub sets GITHUB_SHA)
    commit_sha = os.getenv("GITHUB_SHA") if env == "prod" else None

    # Choose a work pool based on env unless explicitly provided
    work_pool = os.getenv("WORK_POOL", "dev-pool" if env == "dev" else "prod-pool")

    deploy_all(env=env, branch=branch, work_pool=work_pool, commit_sha=commit_sha)
Your flows read the parameter to switch behavior:
Copy code
from prefect import flow, get_run_logger

@flow
def my_flow(env: str = "dev"):
    log = get_run_logger()
    <http://log.info|log.info>(f"Running with env={env}")
    if env == "prod":
        # prod connections / tables / buckets / etc.
        ...
    else:
        # dev/sandbox resources
        ...
How CI “runs the script for dev” vs prod - CI pulls your repo and runs
python deploy.py
with environment variables that tell it which environment to deploy to, which work pool to use, and which branch/commit to reference. - Your Prefect workers: - A dev worker runs on your dev VM, connected to
dev-pool
. - A prod worker runs on your prod VM, connected to
prod-pool
. - When the script creates deployments with
work_pool_name="dev-pool"
, those runs are picked up only by the dev worker; prod deployments target
prod-pool
and are picked up on the prod VM. Example GitHub Actions workflow (single-branch model) - “Dev” job runs on any push except main (you can also tie it to PRs). - “Prod” job runs only on main, pins
commit_sha
automatically. ``` # .github/workflows/deploy.yml name: Deploy Prefect on: push: branches: ["**"] # run on any branch; we'll gate jobs below jobs: deploy-dev: if: github.ref != 'refs/heads/main' runs-on: ubuntu-latest env: PREFECT_API_URL: ${{ secrets.PREFECT_API_URL }} PREFECT_API_KEY: ${{ secrets.PREFECT_API_KEY }} DEPLOY_ENV: dev DEPLOY_BRANCH: main # single-branch model still deploys from main WORK_POOL: dev-pool steps: - uses: actions/checkout@v4 - uses: actions/setup-python@v5 with: python-version: "3.11" - run: pip install -U "prefect>=3" - run: python deploy.py deploy-prod: if: github.ref == 'refs/heads/main' runs-on: ubuntu-latest env: PREFECT_API_URL: ${{ secrets.PREFECT_API_URL }}
PREFECT_API_KEY: ${{ secrets.PREFECT_API_KEY }} DEPLOY_ENV: prod DEPLOY_BRANCH: main WORK_POOL: prod-pool # GITHUB_SHA is provided by Actions automatically and read by deploy.py steps: - uses: actions/checkout@v4 - uses: actions/setup-python@v5 with: python-version: "3.11" - run: pip install -U "prefect>=3" - run: python deploy.py ``` Single-branch vs two-branch: which is “right”? - Single-branch (what you described) - Pros: simplest repo model; one source of truth; dev/prod behavior driven by env vars. - Cons: merging to main means the next CI run is a prod deploy; use approvals or manual gates to avoid accidental prod pushes. - Tips: protect the main branch; use GitHub Environments with required reviewers; pin prod deployments to
GITHUB_SHA
as above. - Two-branch (develop + main) - Pros: clearer separation; dev deploys from
develop
, prod deploys from `main`; safer by default. - Cons: slightly more repo ceremony; occasional merge friction. Either approach works; the key is: - Isolate execution with separate work pools/VMs per environment. - Drive behavior with a parameter like
env
, and keep secrets/config per environment (worker environment variables, Prefect Variables, or Blocks). - Pin prod to a commit for reproducibility. If you share your actual work pool names and whether you want single-branch or two-branch, I can tailor the
deploy.py
and the Actions workflow exactly to your setup.
a
@Marvin so prefect api url, that defines which VM to look at? So how is this being changed/set dynamically?
m
thought for 150.4 seconds
Short answer: PREFECT_API_URL does not point to a VM. It points to your Prefect API (Prefect Cloud workspace or your self-hosted Prefect server). Workers on your VMs connect to that API, poll their assigned work pool(s), and pull runs to execute locally. You route runs to the “right VM” by choosing the work pool (and optional work queue filters), not by changing the API URL. How the pieces fit together - PREFECT_API_URL: the orchestration backend endpoint (Cloud workspace URL or your self-hosted server URL). - PREFECT_API_KEY: credentials to authenticate to that API. - Work pool: the logical queue your deployment targets (set via
work_pool_name=...
in
.deploy()
). - Worker: a process you run on each VM. It uses PREFECT_API_URL and PREFECT_API_KEY to connect to the API and poll its work pool for new runs. Flow of a run 1) Your CI (GitHub Actions) uses PREFECT_API_URL/KEY to register deployments with a given
work_pool_name
(e.g., prod-pool). 2) The worker running on your prod VM is started with
--pool prod-pool
(or equivalent) and the same PREFECT_API_URL/KEY. It polls the API and picks up jobs from that pool to run on that VM. 3) A worker on a dev VM does the same for
dev-pool
. So how do we “change” where things run? - You don’t change PREFECT_API_URL per VM. All actors talk to the same API for a given environment/workspace. - You change the target by: - Using different work pools (dev-pool vs prod-pool), and - Running workers on the appropriate VM(s) tied to those pools. How to set this dynamically in CI - Store environment-specific API URL and API Key in GitHub Secrets (or GitHub Environments). - In each job, export the right values, then run your deployment script. Example patterns: Single workspace, two pools (common in Prefect Cloud) - Both dev and prod jobs use the same PREFECT_API_URL (same workspace) and API key with proper permissions. - Dev job sets
WORK_POOL=dev-pool
and
DEPLOY_ENV=dev
. - Prod job sets
WORK_POOL=prod-pool
and
DEPLOY_ENV=prod
. - Your script uses those values to create deployments that route to the right pool/VM. Separate workspaces (or separate self-hosted servers) - Dev job uses dev workspace/server URL and API key. - Prod job uses prod workspace/server URL and API key. - This isolates state and UI between environments. Concrete GitHub Actions sketch - Single workspace, two pools:
Copy code
jobs:
  deploy-dev:
    env:
      PREFECT_API_URL: ${{ secrets.PREFECT_API_URL }}     # same for dev/prod if same workspace
      PREFECT_API_KEY: ${{ secrets.PREFECT_API_KEY }}
      DEPLOY_ENV: dev
      WORK_POOL: dev-pool
    steps:
      - uses: actions/checkout@v4
      - run: pip install -U "prefect>=3"
      - run: python deploy.py

  deploy-prod:
    env:
      PREFECT_API_URL: ${{ secrets.PREFECT_API_URL }}
      PREFECT_API_KEY: ${{ secrets.PREFECT_API_KEY }}
      DEPLOY_ENV: prod
      WORK_POOL: prod-pool
      # optionally DEPLOY_BRANCH=main and use GITHUB_SHA for pinning
    steps:
      - uses: actions/checkout@v4
      - run: pip install -U "prefect>=3"
      - run: python deploy.py
Worker startup (on each VM) - Dev VM:
Copy code
export PREFECT_API_URL=...    # same API as CI uses
export PREFECT_API_KEY=...
prefect worker start --pool dev-pool
- Prod VM:
Copy code
export PREFECT_API_URL=...
export PREFECT_API_KEY=...
prefect worker start --pool prod-pool
Key takeaways - PREFECT_API_URL selects which orchestration backend you’re talking to, not a compute VM. - Route to the correct VM by targeting the correct work pool; workers on those VMs listen to their pool. - Use CI env vars/secrets to choose the pool and any parameters (e.g.,
env=prod
) when you run your deployment script. Docs for deeper dives: - Settings: PREFECT_API_URL and API keys - Work pools overview
a
@Marvin I don’t understand. I have two VMs at the moment, both with prefect installed. If I want to deploy to one, I set prefect_api_url to the correct api endpoint
m
thought for 183.5 seconds
I see the confusion. PREFECT_API_URL does not point to a VM. It points to your Prefect API (Prefect Cloud workspace or your self-hosted Prefect server). Your workers on each VM connect to that API and poll a specific work pool. You route work to a VM by targeting the pool its worker listens to, not by changing the API URL. Two common setups 1) Single API (most common: one Cloud workspace or one self-hosted server) - Both VMs use the same PREFECT_API_URL/KEY (the same orchestration backend). - VM A runs a worker for dev, VM B runs a worker for prod. - Your deployments specify work_pool_name=dev-pool or work_pool_name=prod-pool so the right worker/VM picks them up. Worker startup on each VM:
Copy code
# VM A (dev)
export PREFECT_API_URL=...   # same API as CI uses
export PREFECT_API_KEY=...
prefect work-pool create dev-pool --type process || true
prefect worker start --pool dev-pool

# VM B (prod)
export PREFECT_API_URL=...
export PREFECT_API_KEY=...
prefect work-pool create prod-pool --type process || true
prefect worker start --pool prod-pool
Your CI just sets which pool to deploy to:
Copy code
# Dev job
export PREFECT_API_URL=...     # same API
export PREFECT_API_KEY=...
export WORK_POOL=dev-pool
python deploy.py

# Prod job
export PREFECT_API_URL=...     # same API
export PREFECT_API_KEY=...
export WORK_POOL=prod-pool
python deploy.py
2) Separate APIs (two Cloud workspaces or two self-hosted servers) - VM A (dev) and VM B (prod) each point to different PREFECT_API_URL/KEY. - CI jobs use different API URLs/keys to register deployments to each environment. - This isolates state between environments, but it’s not required just to route to different VMs. Key mental model - PREFECT_API_URL = “Where is the Prefect control plane?” (Cloud/workspace or server) - work_pool_name = “Which queue should receive this deployment’s runs?” - Worker on a VM = “Listener” for one or more pools; it pulls and executes runs on that VM If you want to target “one VM”, use a pool that only that VM’s worker listens to. Do not change the API URL unless you’re switching to a different Prefect control plane. Docs to confirm - Settings (PREFECT_API_URL, PREFECT_API_KEY) - Work pools overview - Worker polling and health - Creating deployments If you can share whether you’re using Prefect Cloud or self-hosted (and whether you want one API or separate APIs per env), I can give you the exact commands and a ready-to-copy GitHub Actions workflow.
a
@Marvin I’m using self hosted. For example, if I want to access the prefect instance running on VM 1 which is on port 4200, I would set prefect api url to localhost 4200 in order to deploy to it. For VM 2 which has prefect on port 4300, I would set prefect api url to localhost 4300 etc.
m
thought for 59.0 seconds
I think we’re talking past each other a bit, so let me split this into the two possible setups you might be using. First, one important correction: PREFECT_API_URL should include the /api suffix, e.g. http://HOST:PORT/api. And “localhost” only works from the same machine that is running the server process. Option A: one Prefect server (recommended) - Run a single Prefect server (control plane) on one host, e.g. VM1 on port 4200. - Both VMs run workers that connect to that one API URL. - You route runs to a specific VM by choosing the work_pool_name that the worker on that VM is polling. What it looks like - Server on VM1:
Copy code
# On VM1
prefect server start --host 0.0.0.0 --port 4200
# API will be at <http://VM1_HOST_OR_DNS:4200/api>
- Workers on each VM:
Copy code
# On VM1 (prod for example)
export PREFECT_API_URL=<http://VM1_HOST_OR_DNS:4200/api>
export PREFECT_API_KEY=...  # if you require auth
prefect work-pool create prod-pool --type process || true
prefect worker start --pool prod-pool

# On VM2 (dev for example)
export PREFECT_API_URL=<http://VM1_HOST_OR_DNS:4200/api>
export PREFECT_API_KEY=...
prefect work-pool create dev-pool --type process || true
prefect worker start --pool dev-pool
- CI or your deploy script registers deployments to the right pool:
Copy code
# dev job
export PREFECT_API_URL=<http://VM1_HOST_OR_DNS:4200/api>
export PREFECT_API_KEY=...
python deploy.py  # uses work_pool_name="dev-pool"

# prod job
export PREFECT_API_URL=<http://VM1_HOST_OR_DNS:4200/api>
export PREFECT_API_KEY=...
python deploy.py  # uses work_pool_name="prod-pool"
Result: same API URL for everything; workers determine “which VM” actually runs the flow via their pool. You do not switch API URLs to target a VM. Option B: multiple Prefect servers (one per VM) - You can run a separate Prefect server on each VM (e.g., VM1 on 4200, VM2 on 4300). In that case you have two control planes, two UIs, two sets of deployments. - To “deploy to VM1’s server,” use PREFECT_API_URL=http://VM1_HOST_OR_DNS:4200/api. - To “deploy to VM2’s server,” use PREFECT_API_URL=http://VM2_HOST_OR_DNS:4300/api. Notes for this setup - “localhost:4200” only works if you run the deploy script on VM1 itself. From CI or another machine, use the VM’s network address/DNS instead of localhost. - With two servers, state is isolated — deployments/runs on VM1’s server won’t appear on VM2’s UI. Which should you use? - Most teams prefer Option A (one server, multiple workers) because it gives a single pane of glass and simpler routing via work pools. Use dev-pool/prod-pool (or queues with filters) to route work to the VM you want. - Option B is fine if you explicitly want isolation between environments, but remember to use full hostnames instead of localhost when deploying from outside the VM. Helpful references: - Self-hosted server CLI - Settings (PREFECT_API_URL) - Work pools and routing - Worker polling behavior - Creating deployments If you tell me which model you want (one server vs two), I’ll give you exact env vars and commands tailored to your VM hostnames and ports.
a
@Marvin Option 2. Complete separation. Are variables useful in setting up this dev/prod process? I mention localhost because we use SSH tunnelling.
m
thought for 39.0 seconds
Got it — Option 2 with complete separation and SSH tunneling is totally fine. Key points for your setup - You are running two separate Prefect servers (control planes): e.g., VM1 on 4200 and VM2 on 4300. - To deploy to VM1 from your laptop/CI, set
PREFECT_API_URL
to
<http://localhost:4200/api>
after opening an SSH tunnel to VM1. For VM2, tunnel to 4300 and set
<http://localhost:4300/api>
. - Each server has its own workers, work pools, deployments, Variables, and UI — nothing is shared between them. Server and worker on each VM - VM1:
Copy code
# Server
prefect server start --host 0.0.0.0 --port 4200

# Worker (process type is simplest on a VM)
export PREFECT_API_URL=<http://localhost:4200/api>
prefect work-pool create default-pool --type process || true
prefect worker start --pool default-pool
- VM2:
Copy code
# Server
prefect server start --host 0.0.0.0 --port 4300

# Worker
export PREFECT_API_URL=<http://localhost:4300/api>
prefect work-pool create default-pool --type process || true
prefect worker start --pool default-pool
Deploy script (same codebase for both) - You’ll point the script to a specific server by switching
PREFECT_API_URL
via the SSH tunnel. Example (still using GitHub and branch pinning):
Copy code
# deploy.py
import os
from prefect import flow
from prefect.runner.storage import GitRepository

REPO_URL = "<https://github.com/org/repo.git>"
ENTRYPOINTS = [
    ("pipelines/pipeline_one.py:my_flow", "pipeline-one"),
    ("pipelines/pipeline_two.py:my_flow", "pipeline-two"),
]

def deploy_all(env: str, branch: str, work_pool: str, commit_sha: str | None = None):
    source = GitRepository(url=REPO_URL, branch=branch, commit_sha=commit_sha)
    for entrypoint, short_name in ENTRYPOINTS:
        flow.from_source(source=source, entrypoint=entrypoint).deploy(
            name=f"{short_name}-{env}",
            work_pool_name=work_pool,
            parameters={"env": env},   # optional in Option 2, see notes below
            tags=[env],
        )

if __name__ == "__main__":
    env = os.getenv("DEPLOY_ENV", "dev")
    branch = os.getenv("DEPLOY_BRANCH", "main")
    commit_sha = os.getenv("GITHUB_SHA") if env == "prod" else None
    work_pool = os.getenv("WORK_POOL", "default-pool")
    deploy_all(env=env, branch=branch, work_pool=work_pool, commit_sha=commit_sha)
Using SSH tunneling from CI (or locally) - You can forward a local port to each remote server and then set
PREFECT_API_URL
to localhost for that port. Example GitHub Actions jobs: ``` # .github/workflows/deploy.yml name: Deploy Prefect (Self-hosted via SSH tunnels) on: push: branches: ["**"] jobs: deploy-dev: if: github.ref != 'refs/heads/main' runs-on: ubuntu-latest env: DEPLOY_ENV: dev DEPLOY_BRANCH: main WORK_POOL: default-pool steps: - uses: actions/checkout@v4 # Add your SSH key - uses: webfactory/ssh-agent@v0.9.0 with: ssh-private-key: ${{ secrets.SSH_KEY }} # Trust the host key (or manage known_hosts explicitly) - run: | ssh -o StrictHostKeyChecking=no user@VM1_HOST "echo ok" # Open tunnel to VM1: forward a LOCAL port (15420) to VM1:4200 - run: | nohup ssh -N -L 15420localhost4200 user@VM1_HOST >/dev/null 2>&1 & - uses: actions/setup-python@v5 with: python-version: "3.11" - run: pip install -U "prefect>=3" # Point to the tunneled API and deploy - run: | export PREFECT_API_URL=http://localhost:15420/api python deploy.py deploy-prod: if: github.ref == 'refs/heads/main' runs-on: ubuntu-latest env: DEPLOY_ENV: prod DEPLOY_BRANCH: main WORK_POOL: default-pool steps: - uses: actions/checkout@v4 - uses: webfactory/ssh-agent@v0.9.0 with: ssh-private-key: ${{ secrets.SSH_KEY }} - run: | ssh -o StrictHostKeyChecking=no user@VM2_HOST "echo ok"
nohup ssh -N -L 15430localhost4300 user@VM2_HOST >/dev/null 2>&1 & - uses: actions/setup-python@v5 with: python-version: "3.11" - run: pip install -U "prefect>=3" - run: | export PREFECT_API_URL=http://localhost:15430/api python deploy.py
Copy code
Are Variables useful here?
- Yes — Variables shine in Option 2 because each server has its own Variable store, so you can keep the same variable names with different values per environment.
  - Example: define a `DB_URL` Variable on the dev server pointing to the dev DB; on prod server set `DB_URL` to the prod DB. Your flow code stays the same in both environments.

Flow code using Variables
from prefect import flow, get_run_logger from prefect.variable import get as get_variable @flow def my_flow(env: str = "dev"): log = get_run_logger() db_url = get_variable("DB_URL") # resolves per-server log.info(f"env={env}, db_url={db_url}") # connect to db_url, etc. ``` When to use what - Prefect Variables: centralize per-environment config in the server; same names, different values; easy to read in flows. Great for non-secret config (or secrets if you’re okay with server visibility). - Blocks (Secrets/Custom): for secrets/credentials; define separate blocks on each server with the same name, or name them per env. Point flows to the block at runtime. - OS environment variables: quick and local to workers or CI; good for library toggles or container env. - Flow parameters: for occasional runtime overrides (e.g., manual backfills, testing). In Option 2, the
env
parameter becomes optional because your server already defines the environment. Best practices for Option 2 - Keep the same code in dev and prod servers; avoid
if env == "prod": ...
in code where possible. Let Variables/Blocks provide the right values per server. - Pin prod deployments to
GITHUB_SHA
for reproducibility; use a branch for dev. - Use separate schedules, concurrency limits, and notifications per server as needed. Docs: - Self-hosted server CLI - Settings (PREFECT_API_URL) - Work pools and routing - Creating deployments If you share your VM hostnames/ports and preferred variable names, I can tailor the Actions workflow and a quick script to create Variables/Blocks on each server.
a
@Marvin So in the dev VM prefect instance, I can create a vairable called environment and give it value “dev” and in the prod instance do the same but give it value “prod”? What would full development process look like if a developer wanted to create a new pipeline and edit an existing one. What would their exact steps be when it comes to running files, deploying stuff, and github?
m
thought for 230.4 seconds
Yes—Variables are perfect for this with Option 2. On each server you can create the same Variable names with different values, e.g.
environment=dev
on the dev server and
environment=prod
on the prod server. Your flows read the Variable at runtime and behave accordingly. Create/read Variables - Set on each server (over SSH tunnel to the correct VM):
Copy code
# Dev server (tunnel to VM1 -> localhost:4200)
export PREFECT_API_URL=<http://localhost:4200/api>
prefect variable set environment dev
prefect variable set DB_URL <postgresql://dev-db/>...

# Prod server (tunnel to VM2 -> localhost:4300)
export PREFECT_API_URL=<http://localhost:4300/api>
prefect variable set environment prod
prefect variable set DB_URL <postgresql://prod-db/>...
- In flows:
Copy code
from prefect import flow, get_run_logger
from prefect.variables import Variable

@flow
def my_flow():
    log = get_run_logger()
    env = Variable.get("environment", default="dev")  # default helpful for local runs w/o API
    db_url = Variable.get("DB_URL", default=None)
    <http://log.info|log.info>(f"env={env}, db_url={db_url}")
    # branch behavior based on env, connect to db_url, etc.
A practical end-to-end workflow (self-hosted, fully separated dev/prod) 0) One-time setup on each VM - Start server:
Copy code
# VM1 (dev)
prefect server start --host 0.0.0.0 --port 4200

# VM2 (prod)
prefect server start --host 0.0.0.0 --port 4300
- Start a worker and pool (process type is simplest):
Copy code
# Dev
export PREFECT_API_URL=<http://localhost:4200/api>
prefect work-pool create default-pool --type process || true
prefect worker start --pool default-pool

# Prod
export PREFECT_API_URL=<http://localhost:4300/api>
prefect work-pool create default-pool --type process || true
prefect worker start --pool default-pool
- Add per-environment Variables/Blocks (like above). 1) Developer creates a new pipeline - Code: - Create a new flow file, e.g.
pipelines/new_pipeline.py
with
@flow
. - Add a clear entrypoint, e.g.
pipelines/new_pipeline.py:my_new_flow
. - Make the flow read Variables (e.g.,
environment
,
DB_URL
) with sensible defaults for local runs. - Local testing: - Run the flow directly:
Copy code
python -c "from pipelines.new_pipeline import my_new_flow; my_new_flow()"
If you want it to read real server Variables locally, open an SSH tunnel and set
PREFECT_API_URL
to the dev server before running. Otherwise rely on
default=
values for Variables during local runs. - Deploy to dev server (registers the deployment): - Open SSH tunnel to VM1 (example):
ssh -N -L 15420:localhost:4200 user@VM1 &
- In your repo:
Copy code
export PREFECT_API_URL=<http://localhost:15420/api>
    pip install -U "prefect>=3"
    python deploy.py    # your script that calls flow.from_source(...).deploy(...)
Ensure your new entrypoint is included in your
ENTRYPOINTS
list in
deploy.py
, targeting
work_pool_name="default-pool"
on the dev server. - Test the deployment on dev: - Trigger a run:
Copy code
prefect deployment run "my-new-flow/my-new-deployment" --watch
- Confirm the dev worker picks it up and the run is successful. - Review & GitHub: - Commit and open a PR. Your team reviews and approves. - Promote to prod: - Merge to main. Then create the prod deployment on the prod server. - Open tunnel to VM2:
ssh -N -L 15430:localhost:4300 user@VM2 &
- Optionally pin to the merge commit for reproducibility by passing
commit_sha
in
deploy.py
(e.g., from
GITHUB_SHA
if running in CI). - Run:
Copy code
export PREFECT_API_URL=<http://localhost:15430/api>
      python deploy.py
- Trigger a test run in prod when appropriate. 2) Developer edits an existing pipeline - Code changes + local testing as above. - Redeploy to dev:
- If your dev deployment uses
branch=...
in
GitRepository
, the latest commit will be pulled at run time (default pull_interval is 60s). You can re-run
python deploy.py
if you changed deployment metadata (name, pool, parameters, schedules). - Validate on dev (trigger run, inspect logs). - Merge to main. - Redeploy to prod with
commit_sha
pinned to the merge commit (update deployments via
python deploy.py
against the prod server). This gives deterministic prod runs. 3) CI/CD (optional but recommended) - Two jobs in GitHub Actions (or your CI): - Dev job on PRs or non-main branches: - Open SSH tunnel to VM1, set
PREFECT_API_URL=<http://localhost:<tunnel-port>/api>
, run
python deploy.py
. - Prod job on main/tag: - Open SSH tunnel to VM2, set
PREFECT_API_URL=<http://localhost:<tunnel-port>/api>
, export
GITHUB_SHA
, run
python deploy.py
so it pins prod to that commit. - This ensures deployments are registered in the correct server, and prod is reproducible. Notes and tips - Use the same Variable names on both servers; different values per server keep code identical across environments. - Prefer Variables for non-sensitive config, Blocks (e.g., Secret) for credentials. - If your repo is private, pass
credentials
to
GitRepository
(e.g., a token). With public repos you can omit credentials. - Always include
/api
in
PREFECT_API_URL
and point localhost to your SSH tunnel’s local port. - To run a deployment on demand:
prefect deployment run "flow/deployment" --watch
(verified with
prefect deployment run --help
). - To deploy via CLI instead of Python, you can use
prefect deploy
from a
prefect.yaml
project file (we verified
prefect deploy --help
exists). Since you’re already programmatic with
flow.from_source(...).deploy(...)
, either path is fine. Docs: - Variables - Creating deployments - Self-hosted server CLI - Work pools and workers If you share your repo owner/name, the exact flow entrypoints, and whether you want to pin prod to commits, I can tailor a
deploy.py
and a GitHub Actions workflow you can drop in.
a
@Marvin when it comes to work pools and workers. Is it typical to just have one worker looking at a work pool? So process work-pool-a, then on a differnt vm, have one worker polling it and executing code on that vm? If I want to distribute my flows across different VMs (to be executed on different vms) what is the best approach?
m
thought for 149.2 seconds
Great question. Short answer: it’s common (and recommended) to run multiple workers per work pool. One worker per pool is fine for small workloads, but you can start additional workers on other VMs to scale out; Prefect will naturally distribute flow runs across all available workers in the pool. How to distribute flow runs across different VMs - Create a work pool for your chosen infrastructure (process, docker, kubernetes, ecs, etc.) - Start one worker on each VM and point them all at the same pool - Deploy your flows to that pool; runs will be picked up by whichever worker is available Example (pull-based pool with docker workers, but “process” works similarly):
Copy code
# 1) Create a docker work pool
prefect work-pool create --type docker vm-pool

# 2) On VM1
prefect worker start --pool vm-pool --limit 5

# 3) On VM2
prefect worker start --pool vm-pool --limit 5
Then, deploy your flow to the pool:
Copy code
from prefect import flow

@flow
def my_flow():
    ...

if __name__ == "__main__":
    # Option A: deploy from your code
    my_flow.deploy(
        name="prod",
        work_pool_name="vm-pool",
        tags=["team-a"],
        job_variables={
            # optional per-deployment overrides (env, image, etc.)
        },
    )
Notes and best practices - One vs many workers: Multiple workers can watch the same pool and will load-balance automatically. Workers are lightweight and stateless, so scaling horizontally across VMs is normal. - When to create multiple pools: Use separate pools for different infrastructure types or environments (e.g., one for docker on VMs, another for k8s, another for dev vs prod). Within a single infra type/env, a single pool with multiple workers is typical. - Routing/pinning with work queues: Within a pool, create work queues to route specific flows using tag/deployment filters. Point a worker to poll only certain queues if you need stickiness (e.g., “Team A” flows only run on VM1/VM2). - Concurrency controls: - Per-worker:
prefect worker start --limit N
- Per-queue: set queue concurrency limits (helps isolate workloads) - Per-pool: pool-level concurrency limit caps the total concurrent runs across all workers - Worker types: - process: simplest; runs in local process on the VM - docker: runs each flow in a container on the VM; good isolation - kubernetes/ecs/etc.: workers submit per-run jobs to those backends - Push vs pull pools: For your VM scenario, use pull pools (workers required). Push/managed pools don’t need workers, but aren’t for executing on your own VMs. Helpful docs - Work pools (Prefect 3) - Workers (Prefect 3) - Customize job variables - Work pool CLI - Work queue CLI CLI verification note: I attempted to run
--help
for the commands above but the check timed out in my environment. These are the standard Prefect 3.x commands; you can confirm locally with
prefect work-pool create --help
,
prefect worker start --help
, and
prefect work-queue --help
. If you can share: - your preferred worker type (process vs docker vs k8s/etc.) - how many VMs and whether you need strict pinning (certain flows to certain VMs) I can suggest an exact pool/queue/worker layout and sample configs.
a
@Marvin I only plan on using process work pools. Basically I want to have a dev/prod setup. So I guess a dev workpool with certain configurations set (like environment variable set to dev) and a prod workpool. I might want the option to specify that certain deployments, dev or prod, should run on different VMs so one VM doesn’t get overloaded. Looking for simplest approach to do this.
m
thought for 84.8 seconds
Perfect — process workers make this really straightforward. Recommended simplest setup - Create two process work pools: one for dev and one for prod. - Start one or more workers per pool, one on each VM that should handle that environment. - Use per-worker concurrency limits to keep any single VM from getting overloaded. - Set environment variables at the pool or deployment level so dev/prod is explicit and reproducible. Example
Copy code
# Create pools
prefect work-pool create --type process dev-process
prefect work-pool create --type process prod-process

# On Dev VM A (limit concurrency on this box)
prefect worker start --pool dev-process --limit 2

# On Dev VM B
prefect worker start --pool dev-process --limit 2

# On Prod VM A
prefect worker start --pool prod-process --limit 4
Deploy with explicit env per environment - Pool-level (UI): set base job variables for each pool, e.g. env: {"ENV": "dev"} or {"ENV": "prod"}. - Or per-deployment (code):
Copy code
from prefect import flow

@flow
def my_flow():
    ...

if __name__ == "__main__":
    # Dev deployment
    my_flow.deploy(
        name="dev",
        work_pool_name="dev-process",
        job_variables={"env": {"ENV": "dev"}},  # per-run environment vars
    )

    # Prod deployment
    my_flow.deploy(
        name="prod",
        work_pool_name="prod-process",
        job_variables={"env": {"ENV": "prod"}},
    )
How to spread runs across VMs - Easiest: just start multiple workers on different VMs for the same pool. Prefect will balance runs across the available workers automatically. Use
--limit
per worker to cap concurrency on each VM. - If you need hard pinning (certain deployments always on certain VMs): - Use work queues within the pool. Create one queue per VM (e.g.,
dev-vm1
,
dev-vm2
) with tag-based filters. - Tag deployments (e.g.,
tags=["vm1"]
) and configure the queue to pull only that tag. - Start each worker to poll only its queue so those deployments always land on the intended VM. Which to choose? - If your goal is simply “don’t overload a VM,” use one pool per env + multiple workers with
--limit
. It’s the simplest and still balances well. - If you need strict routing of specific deployments to specific VMs, add work queues and tag-based routing. Useful docs - Work pools: Concepts: Work pools - Workers: Concepts: Workers - Customize job variables (env, etc.): How-to: Customize job variables - Work pool CLI: CLI: work-pool - Work queue CLI: CLI: work-queue Want me to sketch the work-queue/tag option with exact commands and example filters, or is the “just balance with per-worker limits” approach enough for your use case?
a
@Marvin I don’t think it’s possible to configue queues to pull specific tags
m
thought for 198.1 seconds
You can do this in Prefect 3.x — work queues support filters by tags (and/or specific deployment IDs). That’s how you can “pin” certain deployments to specific VMs while still keeping a simple dev/prod pool layout. Two simple patterns for process workers A) Simplest (no hard pinning) - Pools: dev-process, prod-process - Start 1+ workers per pool on your VMs - Use per-worker
--limit
to cap concurrency per VM - Prefect will auto-balance runs across the workers in the pool B) Simple pinning with queues (one pool per env, queues per VM) - Keep one pool per env - Create a work queue per VM (e.g., dev-vm1, dev-vm2) - Tag your deployments (e.g.,
tags=["vm1"]
,
tags=["vm2"]
) - Set each queue’s filter to match those tags - Start each worker to poll only its queue, so those tagged runs always land on that VM How to configure queue filters (tags) - In the UI: Work Pools -> your pool -> Work Queues -> create/edit queue and set a filter with tags. - Or via API (example with curl):
Copy code
# Example: pin 'vm1' tagged runs to the dev-vm1 queue in the dev-process pool
curl -X PATCH "$PREFECT_API_URL/api/work_pools/dev-process/queues/dev-vm1" \
  -H "Authorization: Bearer $PREFECT_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "filter": { "tags": ["vm1"] },
    "concurrency_limit": 2,
    "priority": 1
  }'
Deployments with tags (dev/prod shown too)
Copy code
from prefect import flow

@flow
def my_flow():
    ...

# Dev deployment, pinned to vm1 via tag
my_flow.deploy(
    name="dev-vm1",
    work_pool_name="dev-process",
    tags=["dev", "vm1"],
    job_variables={"env": {"ENV": "dev"}}
)

# Prod deployment, pinned to vm2 via tag
my_flow.deploy(
    name="prod-vm2",
    work_pool_name="prod-process",
    tags=["prod", "vm2"],
    job_variables={"env": {"ENV": "prod"}}
)
Start workers
Copy code
# Dev VM1
prefect worker start --pool dev-process --limit 2  # ideally also target the dev-vm1 queue

# Dev VM2
prefect worker start --pool dev-process --limit 2

# Prod VM2
prefect worker start --pool prod-process --limit 4
Tip: start each worker to poll only its VM’s queue for strict pinning (use the worker flag for queue selection; check
prefect worker start --help
to confirm the exact flag name in your version). Docs and references - Work pools: Concepts: Work pools - Workers: Concepts: Workers - Work queue API (shows filter structure with tags): REST: Update work queue - Work-queue CLI: CLI: work-queue - Customize job variables (env): How-to: Job variables If you prefer zero pinning and just want to avoid overload, stick with pattern A: one pool per env + multiple workers with per-worker
--limit
. If you want me to script the queue creation/update for you (UI or API), tell me your pool/VM names and target tags and I’ll drop in exact commands.
a
@Marvin I’m in prefect server, and there’s no filter option under work queues
m
thought for 229.8 seconds
You’re right — in Prefect Server (self‑hosted) the UI does not expose work queue filters. The data model supports filters (by tags or deployment IDs), but the Server UI doesn’t have a control for it today. You can still set filters via the REST API, or avoid filters entirely with a simple pool/worker layout. Two simple options for your dev/prod + VM distribution A) UI-only, no filters (simplest) - Create one process pool per environment:
dev-process
,
prod-process
- Start one worker per VM against the appropriate pool and cap concurrency per VM with
--limit
- Prefect will load-balance across workers in the same pool - If you need strict pinning without filters, create separate pools per VM instead: - dev-vm1-process, dev-vm2-process (and similarly for prod) - Deployments target the specific pool you want to run on - Start one worker per VM on its matching pool This is the easiest strict routing approach using only the Server UI. B) One pool per env + queues per VM (requires API calls) - Keep
dev-process
and
prod-process
pools - Create a queue per VM (e.g.,
dev-vm1
,
dev-vm2
) - Tag your deployments (e.g.,
tags=["vm1"]
,
tags=["vm2"]
) - Set each queue’s filter via the REST API to match those tags - Start each worker to poll only its queue for strict pinning Example: set a queue filter via REST API
Copy code
# Create a queue in the dev-process pool
curl -X POST "$PREFECT_API_URL/api/work_pools/dev-process/queues" \
  -H "Content-Type: application/json" \
  -d '{"name": "dev-vm1", "concurrency_limit": 2, "priority": 1}'

# Update the queue to filter by tag "vm1"
curl -X PATCH "$PREFECT_API_URL/api/work_pools/dev-process/queues/dev-vm1" \
  -H "Content-Type: application/json" \
  -d '{"filter": {"tags": ["vm1"]}}'
Notes: - On Prefect Server, the Authorization header is typically not required (omit it unless you’ve enabled auth). - After setting filters, start the worker on the VM and point it to the dev pool; configure it to poll only the
dev-vm1
queue so “vm1”-tagged runs land on that VM. Deployments with tags (dev/prod + VM tag)
Copy code
from prefect import flow

@flow
def my_flow():
    ...

# Dev pinned to vm1
my_flow.deploy(
    name="dev-vm1",
    work_pool_name="dev-process",
    tags=["dev", "vm1"],
    job_variables={"env": {"ENV": "dev"}}
)

# Prod pinned to vm2
my_flow.deploy(
    name="prod-vm2",
    work_pool_name="prod-process",
    tags=["prod", "vm2"],
    job_variables={"env": {"ENV": "prod"}}
)
Links - Work pools concept: Work pools (Prefect 3) - REST API (update work queue; shows filter structure): Update work queue - Workers concept: Workers (Prefect 3) - Customize job variables: Customize job variables If you’d prefer to avoid any API calls, I’d recommend the “separate pools per VM” approach for strict routing, or the “single pool per env + per-worker --limit” for simple balancing. If you’re open to a one-time script, I can give you exact curl commands for your pool/queue names to set up tag filters.