Aaron
11/07/2025, 9:44 PMfrom prefect import flow
github_url = "<https://github.com/path/path.git>"
work_pool_one = "my-work-pool"
ref="master"
if __name__ == "__main__":
deployment_name = "my-github-deployment_one"
entrypoint = "pipelines/pipeline_one.py:my_flow"
flow.from_source(
source=github_url,
entrypoint=entrypoint,
).deploy(
name=deployment_name,
work_pool_name=work_pool_one,
)
deployment_name = "my-github-deployment_two"
entrypoint = "pipelines/pipeline_two.py:my_flow"
flow.from_source(
source=github_url,
entrypoint=entrypoint,
).deploy(
name=deployment_name,
work_pool_name=work_pool_one,
)Marvin
11/07/2025, 9:44 PMMarvin
11/07/2025, 9:48 PMflow.from_source(...).deploy(...) multiple times. Your entrypoint strings are correct for flows living under subdirectories.
To target a specific Git branch (or a specific commit), pass a GitRepository to source and set branch (or commit_sha). Here’s a cleaned-up version of your script that reuses the same Git source and sets the branch:
from prefect import flow
from prefect.storage import GitRepository
GITHUB_URL = "<https://github.com/path/path.git>"
WORK_POOL = "my-work-pool"
BRANCH = "master" # or "main", or use commit_sha="abc123"
if __name__ == "__main__":
source = GitRepository(url=GITHUB_URL, branch=BRANCH)
flow.from_source(
source=source,
entrypoint="pipelines/pipeline_one.py:my_flow",
).deploy(
name="my-github-deployment_one",
work_pool_name=WORK_POOL,
)
flow.from_source(
source=source,
entrypoint="pipelines/pipeline_two.py:my_flow",
).deploy(
name="my-github-deployment_two",
work_pool_name=WORK_POOL,
)
Notes:
- entrypoint should be path/to/file.py:flow_function_name relative to the repo root.
- If you need reproducibility, prefer pinning commit_sha instead of a branch.
- For private repos, pass credentials via GitRepository(credentials=...) using a GitCredentials block.
If you want, I can show a compact loop-based version that generates many deployments from a list. Do you need to target a private repo or pin to a commit?Aaron
11/07/2025, 9:50 PMAaron
11/07/2025, 9:53 PMMarvin
11/07/2025, 9:53 PMMarvin
11/07/2025, 9:54 PMflow.from_source(...).deploy(...) calls in one script) is correct. Your ref variable isn’t used, though.
The error No module named 'prefect.storage' means your environment is running Prefect 2.x (or older). prefect.storage and GitRepository are Prefect 3.x features.
How to fix:
- Check your version:
python -c "import prefect; print(prefect.__version__)"
- Upgrade to Prefect 3.x (then restart your interpreter/worker):
pip install -U "prefect>=3.0.0"
# or conda-forge:
# conda install -c conda-forge prefect
Once you’re on 3.x, here’s a concise script for a public repo on a specific branch:
from prefect import flow
from prefect.storage import GitRepository
GITHUB_URL = "<https://github.com/path/path.git>"
WORK_POOL = "my-work-pool"
BRANCH = "master" # or "main"
if __name__ == "__main__":
source = GitRepository(url=GITHUB_URL, branch=BRANCH)
flow.from_source(
source=source,
entrypoint="pipelines/pipeline_one.py:my_flow",
).deploy(
name="my-github-deployment_one",
work_pool_name=WORK_POOL,
)
flow.from_source(
source=source,
entrypoint="pipelines/pipeline_two.py:my_flow",
).deploy(
name="my-github-deployment_two",
work_pool_name=WORK_POOL,
)
Notes:
- entrypoint is relative/path/to/file.py:flow_function_name from the repo root.
- If you later need reproducibility, use commit_sha="..." instead of branch=....
If you can’t upgrade right now and must stay on Prefect 2.x, say so and I’ll outline the 2.x approach, but I do recommend moving to 3.x.Aaron
11/07/2025, 9:58 PMMarvin
11/07/2025, 9:58 PMMarvin
11/07/2025, 9:59 PMprefect.runner.storage is the right import in 3.x for GitRepository.
Here’s a compact, loop-based pattern that generates many deployments from a list. It also shows how to pass an env parameter to each deployment so your flows can switch behavior for dev vs prod.
Example: single script, deploy multiple flows for a given environment
from prefect import flow
from prefect.runner.storage import GitRepository
REPO_URL = "<https://github.com/org/repo.git>"
# Each item: (entrypoint, short_name)
ENTRYPOINTS = [
("pipelines/pipeline_one.py:my_flow", "pipeline-one"),
("pipelines/pipeline_two.py:my_flow", "pipeline-two"),
# add as many as you like...
]
def deploy_all(env: str, branch: str, work_pool: str, commit_sha: str | None = None):
source = GitRepository(url=REPO_URL, branch=branch, commit_sha=commit_sha)
for entrypoint, short_name in ENTRYPOINTS:
flow.from_source(source=source, entrypoint=entrypoint).deploy(
name=f"{short_name}-{env}",
work_pool_name=work_pool,
parameters={"env": env}, # flow reads this to switch dev/prod paths
tags=[env], # optional: helps filtering in UI
)
if __name__ == "__main__":
# Example: run one of these per environment (typically from CI)
# Dev deployments (dev branch -> dev work pool)
deploy_all(env="dev", branch="develop", work_pool="dev-pool")
# Prod deployments (main branch -> prod work pool, optionally pin a commit for reproducibility)
# deploy_all(env="prod", branch="main", work_pool="prod-pool", commit_sha=None)
And your flow(s) can accept the env parameter to toggle behavior:
from prefect import flow, get_run_logger
@flow
def my_flow(env: str = "dev"):
logger = get_run_logger()
<http://logger.info|logger.info>(f"Running in env={env}")
if env == "prod":
# connect to prod DB/bucket/etc.
...
else:
# connect to dev/sandbox
...
Recommended development lifecycle
- Source control and branches
- Dev deployments point to a dev branch (e.g., develop) and use env="dev".
- Prod deployments point to main (or pin commit_sha for reproducibility) and use env="prod".
- Infra isolation via work pools
- Run a Prefect worker on your dev VM in a “dev” work pool (e.g., dev-pool).
- Run another worker on your prod VM in a “prod” work pool (e.g., prod-pool).
- Your deployments target the right VM by setting work_pool_name accordingly.
- Parameters vs environment variables
- Use deployment parameters (like {"env": "dev"}) for behavior flags.
- Use environment variables/blocks for secrets and connection strings that differ per environment.
- You can also tag deployments (e.g., ["dev"], ["prod"]) to help filter in the UI.
- CI/CD flow
- On push to the dev branch, your CI runs the script once for dev:
- deploy_all(env="dev", branch="develop", work_pool="dev-pool")
- On merge to main, your CI runs the script for prod:
- deploy_all(env="prod", branch="main", work_pool="prod-pool", commit_sha=<optional pin>)
- Ensure CI has PREFECT_API_URL and PREFECT_API_KEY set for the correct workspace.
- Only run the one block relevant to the environment in a given CI job to avoid creating both dev/prod at the same time.
- Optional hardening for prod
- Pin commit_sha in prod deployments for deterministic code.
- Add schedules, concurrency limits, and alerts to prod deployments.
- Keep dev schedules paused or ad-hoc only.
Would you like me to tailor the script to your exact branch names and work pool names, or show a GitHub Actions example that runs this per-branch?Aaron
11/07/2025, 10:04 PMMarvin
11/07/2025, 10:04 PMMarvin
11/07/2025, 10:06 PMDEPLOY_ENV, WORK_POOL) and then executing the same deployment script. Workers on different VMs listen on different work pools, so the same deployments go to the right place.
A compact, loop-based deployment script (single-branch friendly)
- Uses one code path, switches by DEPLOY_ENV
- Pins prod deployments to the commit that triggered CI (reproducible)
- Targets separate work pools for dev vs prod
# deploy.py
import os
from prefect import flow
from prefect.runner.storage import GitRepository
REPO_URL = "<https://github.com/org/repo.git>"
# (entrypoint, short_name)
ENTRYPOINTS = [
("pipelines/pipeline_one.py:my_flow", "pipeline-one"),
("pipelines/pipeline_two.py:my_flow", "pipeline-two"),
]
def deploy_all(env: str, branch: str, work_pool: str, commit_sha: str | None = None):
source = GitRepository(url=REPO_URL, branch=branch, commit_sha=commit_sha)
for entrypoint, short_name in ENTRYPOINTS:
flow.from_source(source=source, entrypoint=entrypoint).deploy(
name=f"{short_name}-{env}",
work_pool_name=work_pool,
parameters={"env": env},
tags=[env],
)
if __name__ == "__main__":
# Defaults make local runs "dev"
env = os.getenv("DEPLOY_ENV", "dev") # dev or prod
branch = os.getenv("DEPLOY_BRANCH", "main") # single-branch default
# Use the build's commit for prod pinning (GitHub sets GITHUB_SHA)
commit_sha = os.getenv("GITHUB_SHA") if env == "prod" else None
# Choose a work pool based on env unless explicitly provided
work_pool = os.getenv("WORK_POOL", "dev-pool" if env == "dev" else "prod-pool")
deploy_all(env=env, branch=branch, work_pool=work_pool, commit_sha=commit_sha)
Your flows read the parameter to switch behavior:
from prefect import flow, get_run_logger
@flow
def my_flow(env: str = "dev"):
log = get_run_logger()
<http://log.info|log.info>(f"Running with env={env}")
if env == "prod":
# prod connections / tables / buckets / etc.
...
else:
# dev/sandbox resources
...
How CI “runs the script for dev” vs prod
- CI pulls your repo and runs python deploy.py with environment variables that tell it which environment to deploy to, which work pool to use, and which branch/commit to reference.
- Your Prefect workers:
- A dev worker runs on your dev VM, connected to dev-pool.
- A prod worker runs on your prod VM, connected to prod-pool.
- When the script creates deployments with work_pool_name="dev-pool", those runs are picked up only by the dev worker; prod deployments target prod-pool and are picked up on the prod VM.
Example GitHub Actions workflow (single-branch model)
- “Dev” job runs on any push except main (you can also tie it to PRs).
- “Prod” job runs only on main, pins commit_sha automatically.
```
# .github/workflows/deploy.yml
name: Deploy Prefect
on:
push:
branches: ["**"] # run on any branch; we'll gate jobs below
jobs:
deploy-dev:
if: github.ref != 'refs/heads/main'
runs-on: ubuntu-latest
env:
PREFECT_API_URL: ${{ secrets.PREFECT_API_URL }}
PREFECT_API_KEY: ${{ secrets.PREFECT_API_KEY }}
DEPLOY_ENV: dev
DEPLOY_BRANCH: main # single-branch model still deploys from main
WORK_POOL: dev-pool
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.11"
- run: pip install -U "prefect>=3"
- run: python deploy.py
deploy-prod:
if: github.ref == 'refs/heads/main'
runs-on: ubuntu-latest
env:
PREFECT_API_URL: ${{ secrets.PREFECT_API_URL }}Marvin
11/07/2025, 10:06 PMGITHUB_SHA as above.
- Two-branch (develop + main)
- Pros: clearer separation; dev deploys from develop, prod deploys from `main`; safer by default.
- Cons: slightly more repo ceremony; occasional merge friction.
Either approach works; the key is:
- Isolate execution with separate work pools/VMs per environment.
- Drive behavior with a parameter like env, and keep secrets/config per environment (worker environment variables, Prefect Variables, or Blocks).
- Pin prod to a commit for reproducibility.
If you share your actual work pool names and whether you want single-branch or two-branch, I can tailor the deploy.py and the Actions workflow exactly to your setup.Aaron
11/07/2025, 10:20 PMMarvin
11/07/2025, 10:20 PMMarvin
11/07/2025, 10:22 PMwork_pool_name=... in .deploy()).
- Worker: a process you run on each VM. It uses PREFECT_API_URL and PREFECT_API_KEY to connect to the API and poll its work pool for new runs.
Flow of a run
1) Your CI (GitHub Actions) uses PREFECT_API_URL/KEY to register deployments with a given work_pool_name (e.g., prod-pool).
2) The worker running on your prod VM is started with --pool prod-pool (or equivalent) and the same PREFECT_API_URL/KEY. It polls the API and picks up jobs from that pool to run on that VM.
3) A worker on a dev VM does the same for dev-pool.
So how do we “change” where things run?
- You don’t change PREFECT_API_URL per VM. All actors talk to the same API for a given environment/workspace.
- You change the target by:
- Using different work pools (dev-pool vs prod-pool), and
- Running workers on the appropriate VM(s) tied to those pools.
How to set this dynamically in CI
- Store environment-specific API URL and API Key in GitHub Secrets (or GitHub Environments).
- In each job, export the right values, then run your deployment script. Example patterns:
Single workspace, two pools (common in Prefect Cloud)
- Both dev and prod jobs use the same PREFECT_API_URL (same workspace) and API key with proper permissions.
- Dev job sets WORK_POOL=dev-pool and DEPLOY_ENV=dev.
- Prod job sets WORK_POOL=prod-pool and DEPLOY_ENV=prod.
- Your script uses those values to create deployments that route to the right pool/VM.
Separate workspaces (or separate self-hosted servers)
- Dev job uses dev workspace/server URL and API key.
- Prod job uses prod workspace/server URL and API key.
- This isolates state and UI between environments.
Concrete GitHub Actions sketch
- Single workspace, two pools:
jobs:
deploy-dev:
env:
PREFECT_API_URL: ${{ secrets.PREFECT_API_URL }} # same for dev/prod if same workspace
PREFECT_API_KEY: ${{ secrets.PREFECT_API_KEY }}
DEPLOY_ENV: dev
WORK_POOL: dev-pool
steps:
- uses: actions/checkout@v4
- run: pip install -U "prefect>=3"
- run: python deploy.py
deploy-prod:
env:
PREFECT_API_URL: ${{ secrets.PREFECT_API_URL }}
PREFECT_API_KEY: ${{ secrets.PREFECT_API_KEY }}
DEPLOY_ENV: prod
WORK_POOL: prod-pool
# optionally DEPLOY_BRANCH=main and use GITHUB_SHA for pinning
steps:
- uses: actions/checkout@v4
- run: pip install -U "prefect>=3"
- run: python deploy.py
Worker startup (on each VM)
- Dev VM:
export PREFECT_API_URL=... # same API as CI uses
export PREFECT_API_KEY=...
prefect worker start --pool dev-pool
- Prod VM:
export PREFECT_API_URL=...
export PREFECT_API_KEY=...
prefect worker start --pool prod-pool
Key takeaways
- PREFECT_API_URL selects which orchestration backend you’re talking to, not a compute VM.
- Route to the correct VM by targeting the correct work pool; workers on those VMs listen to their pool.
- Use CI env vars/secrets to choose the pool and any parameters (e.g., env=prod) when you run your deployment script.
Docs for deeper dives:
- Settings: PREFECT_API_URL and API keys
- Work pools overviewMarvin
11/07/2025, 10:22 PMAaron
11/07/2025, 10:26 PMMarvin
11/07/2025, 10:26 PMMarvin
11/07/2025, 10:29 PM# VM A (dev)
export PREFECT_API_URL=... # same API as CI uses
export PREFECT_API_KEY=...
prefect work-pool create dev-pool --type process || true
prefect worker start --pool dev-pool
# VM B (prod)
export PREFECT_API_URL=...
export PREFECT_API_KEY=...
prefect work-pool create prod-pool --type process || true
prefect worker start --pool prod-pool
Your CI just sets which pool to deploy to:
# Dev job
export PREFECT_API_URL=... # same API
export PREFECT_API_KEY=...
export WORK_POOL=dev-pool
python deploy.py
# Prod job
export PREFECT_API_URL=... # same API
export PREFECT_API_KEY=...
export WORK_POOL=prod-pool
python deploy.py
2) Separate APIs (two Cloud workspaces or two self-hosted servers)
- VM A (dev) and VM B (prod) each point to different PREFECT_API_URL/KEY.
- CI jobs use different API URLs/keys to register deployments to each environment.
- This isolates state between environments, but it’s not required just to route to different VMs.
Key mental model
- PREFECT_API_URL = “Where is the Prefect control plane?” (Cloud/workspace or server)
- work_pool_name = “Which queue should receive this deployment’s runs?”
- Worker on a VM = “Listener” for one or more pools; it pulls and executes runs on that VM
If you want to target “one VM”, use a pool that only that VM’s worker listens to. Do not change the API URL unless you’re switching to a different Prefect control plane.
Docs to confirm
- Settings (PREFECT_API_URL, PREFECT_API_KEY)
- Work pools overview
- Worker polling and health
- Creating deployments
If you can share whether you’re using Prefect Cloud or self-hosted (and whether you want one API or separate APIs per env), I can give you the exact commands and a ready-to-copy GitHub Actions workflow.Aaron
11/07/2025, 10:32 PMMarvin
11/07/2025, 10:32 PMMarvin
11/07/2025, 10:33 PM# On VM1
prefect server start --host 0.0.0.0 --port 4200
# API will be at <http://VM1_HOST_OR_DNS:4200/api>
- Workers on each VM:
# On VM1 (prod for example)
export PREFECT_API_URL=<http://VM1_HOST_OR_DNS:4200/api>
export PREFECT_API_KEY=... # if you require auth
prefect work-pool create prod-pool --type process || true
prefect worker start --pool prod-pool
# On VM2 (dev for example)
export PREFECT_API_URL=<http://VM1_HOST_OR_DNS:4200/api>
export PREFECT_API_KEY=...
prefect work-pool create dev-pool --type process || true
prefect worker start --pool dev-pool
- CI or your deploy script registers deployments to the right pool:
# dev job
export PREFECT_API_URL=<http://VM1_HOST_OR_DNS:4200/api>
export PREFECT_API_KEY=...
python deploy.py # uses work_pool_name="dev-pool"
# prod job
export PREFECT_API_URL=<http://VM1_HOST_OR_DNS:4200/api>
export PREFECT_API_KEY=...
python deploy.py # uses work_pool_name="prod-pool"
Result: same API URL for everything; workers determine “which VM” actually runs the flow via their pool. You do not switch API URLs to target a VM.
Option B: multiple Prefect servers (one per VM)
- You can run a separate Prefect server on each VM (e.g., VM1 on 4200, VM2 on 4300). In that case you have two control planes, two UIs, two sets of deployments.
- To “deploy to VM1’s server,” use PREFECT_API_URL=http://VM1_HOST_OR_DNS:4200/api.
- To “deploy to VM2’s server,” use PREFECT_API_URL=http://VM2_HOST_OR_DNS:4300/api.
Notes for this setup
- “localhost:4200” only works if you run the deploy script on VM1 itself. From CI or another machine, use the VM’s network address/DNS instead of localhost.
- With two servers, state is isolated — deployments/runs on VM1’s server won’t appear on VM2’s UI.
Which should you use?
- Most teams prefer Option A (one server, multiple workers) because it gives a single pane of glass and simpler routing via work pools. Use dev-pool/prod-pool (or queues with filters) to route work to the VM you want.
- Option B is fine if you explicitly want isolation between environments, but remember to use full hostnames instead of localhost when deploying from outside the VM.
Helpful references:
- Self-hosted server CLI
- Settings (PREFECT_API_URL)
- Work pools and routing
- Worker polling behavior
- Creating deployments
If you tell me which model you want (one server vs two), I’ll give you exact env vars and commands tailored to your VM hostnames and ports.Aaron
11/07/2025, 10:37 PMMarvin
11/07/2025, 10:37 PMMarvin
11/07/2025, 10:38 PMPREFECT_API_URL to <http://localhost:4200/api> after opening an SSH tunnel to VM1. For VM2, tunnel to 4300 and set <http://localhost:4300/api>.
- Each server has its own workers, work pools, deployments, Variables, and UI — nothing is shared between them.
Server and worker on each VM
- VM1:
# Server
prefect server start --host 0.0.0.0 --port 4200
# Worker (process type is simplest on a VM)
export PREFECT_API_URL=<http://localhost:4200/api>
prefect work-pool create default-pool --type process || true
prefect worker start --pool default-pool
- VM2:
# Server
prefect server start --host 0.0.0.0 --port 4300
# Worker
export PREFECT_API_URL=<http://localhost:4300/api>
prefect work-pool create default-pool --type process || true
prefect worker start --pool default-pool
Deploy script (same codebase for both)
- You’ll point the script to a specific server by switching PREFECT_API_URL via the SSH tunnel. Example (still using GitHub and branch pinning):
# deploy.py
import os
from prefect import flow
from prefect.runner.storage import GitRepository
REPO_URL = "<https://github.com/org/repo.git>"
ENTRYPOINTS = [
("pipelines/pipeline_one.py:my_flow", "pipeline-one"),
("pipelines/pipeline_two.py:my_flow", "pipeline-two"),
]
def deploy_all(env: str, branch: str, work_pool: str, commit_sha: str | None = None):
source = GitRepository(url=REPO_URL, branch=branch, commit_sha=commit_sha)
for entrypoint, short_name in ENTRYPOINTS:
flow.from_source(source=source, entrypoint=entrypoint).deploy(
name=f"{short_name}-{env}",
work_pool_name=work_pool,
parameters={"env": env}, # optional in Option 2, see notes below
tags=[env],
)
if __name__ == "__main__":
env = os.getenv("DEPLOY_ENV", "dev")
branch = os.getenv("DEPLOY_BRANCH", "main")
commit_sha = os.getenv("GITHUB_SHA") if env == "prod" else None
work_pool = os.getenv("WORK_POOL", "default-pool")
deploy_all(env=env, branch=branch, work_pool=work_pool, commit_sha=commit_sha)
Using SSH tunneling from CI (or locally)
- You can forward a local port to each remote server and then set PREFECT_API_URL to localhost for that port. Example GitHub Actions jobs:
```
# .github/workflows/deploy.yml
name: Deploy Prefect (Self-hosted via SSH tunnels)
on:
push:
branches: ["**"]
jobs:
deploy-dev:
if: github.ref != 'refs/heads/main'
runs-on: ubuntu-latest
env:
DEPLOY_ENV: dev
DEPLOY_BRANCH: main
WORK_POOL: default-pool
steps:
- uses: actions/checkout@v4
# Add your SSH key
- uses: webfactory/ssh-agent@v0.9.0
with:
ssh-private-key: ${{ secrets.SSH_KEY }}
# Trust the host key (or manage known_hosts explicitly)
- run: |
ssh -o StrictHostKeyChecking=no user@VM1_HOST "echo ok"
# Open tunnel to VM1: forward a LOCAL port (15420) to VM1:4200
- run: |
nohup ssh -N -L 15420localhost4200 user@VM1_HOST >/dev/null 2>&1 &
- uses: actions/setup-python@v5
with:
python-version: "3.11"
- run: pip install -U "prefect>=3"
# Point to the tunneled API and deploy
- run: |
export PREFECT_API_URL=http://localhost:15420/api
python deploy.py
deploy-prod:
if: github.ref == 'refs/heads/main'
runs-on: ubuntu-latest
env:
DEPLOY_ENV: prod
DEPLOY_BRANCH: main
WORK_POOL: default-pool
steps:
- uses: actions/checkout@v4
- uses: webfactory/ssh-agent@v0.9.0
with:
ssh-private-key: ${{ secrets.SSH_KEY }}
- run: |
ssh -o StrictHostKeyChecking=no user@VM2_HOST "echo ok"Marvin
11/07/2025, 10:38 PMAre Variables useful here?
- Yes — Variables shine in Option 2 because each server has its own Variable store, so you can keep the same variable names with different values per environment.
- Example: define a `DB_URL` Variable on the dev server pointing to the dev DB; on prod server set `DB_URL` to the prod DB. Your flow code stays the same in both environments.
Flow code using Variables
from prefect import flow, get_run_logger
from prefect.variable import get as get_variable
@flow
def my_flow(env: str = "dev"):
log = get_run_logger()
db_url = get_variable("DB_URL") # resolves per-server
log.info(f"env={env}, db_url={db_url}")
# connect to db_url, etc.
```
When to use what
- Prefect Variables: centralize per-environment config in the server; same names, different values; easy to read in flows. Great for non-secret config (or secrets if you’re okay with server visibility).
- Blocks (Secrets/Custom): for secrets/credentials; define separate blocks on each server with the same name, or name them per env. Point flows to the block at runtime.
- OS environment variables: quick and local to workers or CI; good for library toggles or container env.
- Flow parameters: for occasional runtime overrides (e.g., manual backfills, testing). In Option 2, the env parameter becomes optional because your server already defines the environment.
Best practices for Option 2
- Keep the same code in dev and prod servers; avoid if env == "prod": ... in code where possible. Let Variables/Blocks provide the right values per server.
- Pin prod deployments to GITHUB_SHA for reproducibility; use a branch for dev.
- Use separate schedules, concurrency limits, and notifications per server as needed.
Docs:
- Self-hosted server CLI
- Settings (PREFECT_API_URL)
- Work pools and routing
- Creating deployments
If you share your VM hostnames/ports and preferred variable names, I can tailor the Actions workflow and a quick script to create Variables/Blocks on each server.Aaron
11/07/2025, 10:47 PMMarvin
11/07/2025, 10:47 PMMarvin
11/07/2025, 10:51 PMenvironment=dev on the dev server and environment=prod on the prod server. Your flows read the Variable at runtime and behave accordingly.
Create/read Variables
- Set on each server (over SSH tunnel to the correct VM):
# Dev server (tunnel to VM1 -> localhost:4200)
export PREFECT_API_URL=<http://localhost:4200/api>
prefect variable set environment dev
prefect variable set DB_URL <postgresql://dev-db/>...
# Prod server (tunnel to VM2 -> localhost:4300)
export PREFECT_API_URL=<http://localhost:4300/api>
prefect variable set environment prod
prefect variable set DB_URL <postgresql://prod-db/>...
- In flows:
from prefect import flow, get_run_logger
from prefect.variables import Variable
@flow
def my_flow():
log = get_run_logger()
env = Variable.get("environment", default="dev") # default helpful for local runs w/o API
db_url = Variable.get("DB_URL", default=None)
<http://log.info|log.info>(f"env={env}, db_url={db_url}")
# branch behavior based on env, connect to db_url, etc.
A practical end-to-end workflow (self-hosted, fully separated dev/prod)
0) One-time setup on each VM
- Start server:
# VM1 (dev)
prefect server start --host 0.0.0.0 --port 4200
# VM2 (prod)
prefect server start --host 0.0.0.0 --port 4300
- Start a worker and pool (process type is simplest):
# Dev
export PREFECT_API_URL=<http://localhost:4200/api>
prefect work-pool create default-pool --type process || true
prefect worker start --pool default-pool
# Prod
export PREFECT_API_URL=<http://localhost:4300/api>
prefect work-pool create default-pool --type process || true
prefect worker start --pool default-pool
- Add per-environment Variables/Blocks (like above).
1) Developer creates a new pipeline
- Code:
- Create a new flow file, e.g. pipelines/new_pipeline.py with @flow.
- Add a clear entrypoint, e.g. pipelines/new_pipeline.py:my_new_flow.
- Make the flow read Variables (e.g., environment, DB_URL) with sensible defaults for local runs.
- Local testing:
- Run the flow directly:
python -c "from pipelines.new_pipeline import my_new_flow; my_new_flow()"
If you want it to read real server Variables locally, open an SSH tunnel and set PREFECT_API_URL to the dev server before running. Otherwise rely on default= values for Variables during local runs.
- Deploy to dev server (registers the deployment):
- Open SSH tunnel to VM1 (example): ssh -N -L 15420:localhost:4200 user@VM1 &
- In your repo:
export PREFECT_API_URL=<http://localhost:15420/api>
pip install -U "prefect>=3"
python deploy.py # your script that calls flow.from_source(...).deploy(...)
Ensure your new entrypoint is included in your ENTRYPOINTS list in deploy.py, targeting work_pool_name="default-pool" on the dev server.
- Test the deployment on dev:
- Trigger a run:
prefect deployment run "my-new-flow/my-new-deployment" --watch
- Confirm the dev worker picks it up and the run is successful.
- Review & GitHub:
- Commit and open a PR. Your team reviews and approves.
- Promote to prod:
- Merge to main. Then create the prod deployment on the prod server.
- Open tunnel to VM2: ssh -N -L 15430:localhost:4300 user@VM2 &
- Optionally pin to the merge commit for reproducibility by passing commit_sha in deploy.py (e.g., from GITHUB_SHA if running in CI).
- Run:
export PREFECT_API_URL=<http://localhost:15430/api>
python deploy.py
- Trigger a test run in prod when appropriate.
2) Developer edits an existing pipeline
- Code changes + local testing as above.
- Redeploy to dev:Marvin
11/07/2025, 10:51 PMbranch=... in GitRepository, the latest commit will be pulled at run time (default pull_interval is 60s). You can re-run python deploy.py if you changed deployment metadata (name, pool, parameters, schedules).
- Validate on dev (trigger run, inspect logs).
- Merge to main.
- Redeploy to prod with commit_sha pinned to the merge commit (update deployments via python deploy.py against the prod server). This gives deterministic prod runs.
3) CI/CD (optional but recommended)
- Two jobs in GitHub Actions (or your CI):
- Dev job on PRs or non-main branches:
- Open SSH tunnel to VM1, set PREFECT_API_URL=<http://localhost:<tunnel-port>/api>, run python deploy.py.
- Prod job on main/tag:
- Open SSH tunnel to VM2, set PREFECT_API_URL=<http://localhost:<tunnel-port>/api>, export GITHUB_SHA, run python deploy.py so it pins prod to that commit.
- This ensures deployments are registered in the correct server, and prod is reproducible.
Notes and tips
- Use the same Variable names on both servers; different values per server keep code identical across environments.
- Prefer Variables for non-sensitive config, Blocks (e.g., Secret) for credentials.
- If your repo is private, pass credentials to GitRepository (e.g., a token). With public repos you can omit credentials.
- Always include /api in PREFECT_API_URL and point localhost to your SSH tunnel’s local port.
- To run a deployment on demand: prefect deployment run "flow/deployment" --watch (verified with prefect deployment run --help).
- To deploy via CLI instead of Python, you can use prefect deploy from a prefect.yaml project file (we verified prefect deploy --help exists). Since you’re already programmatic with flow.from_source(...).deploy(...), either path is fine.
Docs:
- Variables
- Creating deployments
- Self-hosted server CLI
- Work pools and workers
If you share your repo owner/name, the exact flow entrypoints, and whether you want to pin prod to commits, I can tailor a deploy.py and a GitHub Actions workflow you can drop in.Aaron
11/08/2025, 6:37 PMMarvin
11/08/2025, 6:37 PMMarvin
11/08/2025, 6:40 PM# 1) Create a docker work pool
prefect work-pool create --type docker vm-pool
# 2) On VM1
prefect worker start --pool vm-pool --limit 5
# 3) On VM2
prefect worker start --pool vm-pool --limit 5
Then, deploy your flow to the pool:
from prefect import flow
@flow
def my_flow():
...
if __name__ == "__main__":
# Option A: deploy from your code
my_flow.deploy(
name="prod",
work_pool_name="vm-pool",
tags=["team-a"],
job_variables={
# optional per-deployment overrides (env, image, etc.)
},
)
Notes and best practices
- One vs many workers: Multiple workers can watch the same pool and will load-balance automatically. Workers are lightweight and stateless, so scaling horizontally across VMs is normal.
- When to create multiple pools: Use separate pools for different infrastructure types or environments (e.g., one for docker on VMs, another for k8s, another for dev vs prod). Within a single infra type/env, a single pool with multiple workers is typical.
- Routing/pinning with work queues: Within a pool, create work queues to route specific flows using tag/deployment filters. Point a worker to poll only certain queues if you need stickiness (e.g., “Team A” flows only run on VM1/VM2).
- Concurrency controls:
- Per-worker: prefect worker start --limit N
- Per-queue: set queue concurrency limits (helps isolate workloads)
- Per-pool: pool-level concurrency limit caps the total concurrent runs across all workers
- Worker types:
- process: simplest; runs in local process on the VM
- docker: runs each flow in a container on the VM; good isolation
- kubernetes/ecs/etc.: workers submit per-run jobs to those backends
- Push vs pull pools: For your VM scenario, use pull pools (workers required). Push/managed pools don’t need workers, but aren’t for executing on your own VMs.
Helpful docs
- Work pools (Prefect 3)
- Workers (Prefect 3)
- Customize job variables
- Work pool CLI
- Work queue CLI
CLI verification note: I attempted to run --help for the commands above but the check timed out in my environment. These are the standard Prefect 3.x commands; you can confirm locally with prefect work-pool create --help, prefect worker start --help, and prefect work-queue --help.
If you can share:
- your preferred worker type (process vs docker vs k8s/etc.)
- how many VMs and whether you need strict pinning (certain flows to certain VMs)
I can suggest an exact pool/queue/worker layout and sample configs.Aaron
11/08/2025, 6:45 PMMarvin
11/08/2025, 6:45 PMMarvin
11/08/2025, 6:46 PM# Create pools
prefect work-pool create --type process dev-process
prefect work-pool create --type process prod-process
# On Dev VM A (limit concurrency on this box)
prefect worker start --pool dev-process --limit 2
# On Dev VM B
prefect worker start --pool dev-process --limit 2
# On Prod VM A
prefect worker start --pool prod-process --limit 4
Deploy with explicit env per environment
- Pool-level (UI): set base job variables for each pool, e.g. env: {"ENV": "dev"} or {"ENV": "prod"}.
- Or per-deployment (code):
from prefect import flow
@flow
def my_flow():
...
if __name__ == "__main__":
# Dev deployment
my_flow.deploy(
name="dev",
work_pool_name="dev-process",
job_variables={"env": {"ENV": "dev"}}, # per-run environment vars
)
# Prod deployment
my_flow.deploy(
name="prod",
work_pool_name="prod-process",
job_variables={"env": {"ENV": "prod"}},
)
How to spread runs across VMs
- Easiest: just start multiple workers on different VMs for the same pool. Prefect will balance runs across the available workers automatically. Use --limit per worker to cap concurrency on each VM.
- If you need hard pinning (certain deployments always on certain VMs):
- Use work queues within the pool. Create one queue per VM (e.g., dev-vm1, dev-vm2) with tag-based filters.
- Tag deployments (e.g., tags=["vm1"]) and configure the queue to pull only that tag.
- Start each worker to poll only its queue so those deployments always land on the intended VM.
Which to choose?
- If your goal is simply “don’t overload a VM,” use one pool per env + multiple workers with --limit. It’s the simplest and still balances well.
- If you need strict routing of specific deployments to specific VMs, add work queues and tag-based routing.
Useful docs
- Work pools: Concepts: Work pools
- Workers: Concepts: Workers
- Customize job variables (env, etc.): How-to: Customize job variables
- Work pool CLI: CLI: work-pool
- Work queue CLI: CLI: work-queue
Want me to sketch the work-queue/tag option with exact commands and example filters, or is the “just balance with per-worker limits” approach enough for your use case?Aaron
11/08/2025, 6:56 PMMarvin
11/08/2025, 6:56 PMMarvin
11/08/2025, 6:59 PM--limit to cap concurrency per VM
- Prefect will auto-balance runs across the workers in the pool
B) Simple pinning with queues (one pool per env, queues per VM)
- Keep one pool per env
- Create a work queue per VM (e.g., dev-vm1, dev-vm2)
- Tag your deployments (e.g., tags=["vm1"], tags=["vm2"])
- Set each queue’s filter to match those tags
- Start each worker to poll only its queue, so those tagged runs always land on that VM
How to configure queue filters (tags)
- In the UI: Work Pools -> your pool -> Work Queues -> create/edit queue and set a filter with tags.
- Or via API (example with curl):
# Example: pin 'vm1' tagged runs to the dev-vm1 queue in the dev-process pool
curl -X PATCH "$PREFECT_API_URL/api/work_pools/dev-process/queues/dev-vm1" \
-H "Authorization: Bearer $PREFECT_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"filter": { "tags": ["vm1"] },
"concurrency_limit": 2,
"priority": 1
}'
Deployments with tags (dev/prod shown too)
from prefect import flow
@flow
def my_flow():
...
# Dev deployment, pinned to vm1 via tag
my_flow.deploy(
name="dev-vm1",
work_pool_name="dev-process",
tags=["dev", "vm1"],
job_variables={"env": {"ENV": "dev"}}
)
# Prod deployment, pinned to vm2 via tag
my_flow.deploy(
name="prod-vm2",
work_pool_name="prod-process",
tags=["prod", "vm2"],
job_variables={"env": {"ENV": "prod"}}
)
Start workers
# Dev VM1
prefect worker start --pool dev-process --limit 2 # ideally also target the dev-vm1 queue
# Dev VM2
prefect worker start --pool dev-process --limit 2
# Prod VM2
prefect worker start --pool prod-process --limit 4
Tip: start each worker to poll only its VM’s queue for strict pinning (use the worker flag for queue selection; check prefect worker start --help to confirm the exact flag name in your version).
Docs and references
- Work pools: Concepts: Work pools
- Workers: Concepts: Workers
- Work queue API (shows filter structure with tags): REST: Update work queue
- Work-queue CLI: CLI: work-queue
- Customize job variables (env): How-to: Job variables
If you prefer zero pinning and just want to avoid overload, stick with pattern A: one pool per env + multiple workers with per-worker --limit. If you want me to script the queue creation/update for you (UI or API), tell me your pool/VM names and target tags and I’ll drop in exact commands.Aaron
11/08/2025, 7:00 PMMarvin
11/08/2025, 7:01 PMMarvin
11/08/2025, 7:04 PMdev-process, prod-process
- Start one worker per VM against the appropriate pool and cap concurrency per VM with --limit
- Prefect will load-balance across workers in the same pool
- If you need strict pinning without filters, create separate pools per VM instead:
- dev-vm1-process, dev-vm2-process (and similarly for prod)
- Deployments target the specific pool you want to run on
- Start one worker per VM on its matching pool
This is the easiest strict routing approach using only the Server UI.
B) One pool per env + queues per VM (requires API calls)
- Keep dev-process and prod-process pools
- Create a queue per VM (e.g., dev-vm1, dev-vm2)
- Tag your deployments (e.g., tags=["vm1"], tags=["vm2"])
- Set each queue’s filter via the REST API to match those tags
- Start each worker to poll only its queue for strict pinning
Example: set a queue filter via REST API
# Create a queue in the dev-process pool
curl -X POST "$PREFECT_API_URL/api/work_pools/dev-process/queues" \
-H "Content-Type: application/json" \
-d '{"name": "dev-vm1", "concurrency_limit": 2, "priority": 1}'
# Update the queue to filter by tag "vm1"
curl -X PATCH "$PREFECT_API_URL/api/work_pools/dev-process/queues/dev-vm1" \
-H "Content-Type: application/json" \
-d '{"filter": {"tags": ["vm1"]}}'
Notes:
- On Prefect Server, the Authorization header is typically not required (omit it unless you’ve enabled auth).
- After setting filters, start the worker on the VM and point it to the dev pool; configure it to poll only the dev-vm1 queue so “vm1”-tagged runs land on that VM.
Deployments with tags (dev/prod + VM tag)
from prefect import flow
@flow
def my_flow():
...
# Dev pinned to vm1
my_flow.deploy(
name="dev-vm1",
work_pool_name="dev-process",
tags=["dev", "vm1"],
job_variables={"env": {"ENV": "dev"}}
)
# Prod pinned to vm2
my_flow.deploy(
name="prod-vm2",
work_pool_name="prod-process",
tags=["prod", "vm2"],
job_variables={"env": {"ENV": "prod"}}
)
Links
- Work pools concept: Work pools (Prefect 3)
- REST API (update work queue; shows filter structure): Update work queue
- Workers concept: Workers (Prefect 3)
- Customize job variables: Customize job variables
If you’d prefer to avoid any API calls, I’d recommend the “separate pools per VM” approach for strict routing, or the “single pool per env + per-worker --limit” for simple balancing. If you’re open to a one-time script, I can give you exact curl commands for your pool/queue names to set up tag filters.