<@ULVA73B9P>, what is best practice to run the sam...
# ask-marvin
m
@Marvin, what is best practice to run the same flow with different parameters? is that to create multiple deployments? how would one go about doing that?
m
thought for 61.4 seconds
There are two main approaches to running the same flow with different parameters, and the best choice depends on your specific needs: 1. Multiple Deployments (Recommended for production): This approach is ideal when you have well-defined, distinct parameter sets that you want to run on a schedule or trigger separately. Here's how to set it up:
Copy code
yaml
# prefect.yaml
deployments:
  - name: process-customer-data-us
    entrypoint: flows/process_data.py:main_flow
    parameters:
      region: "US"
      batch_size: 1000
    work_pool:
      name: my-work-pool
      
  - name: process-customer-data-eu
    entrypoint: flows/process_data.py:main_flow
    parameters:
      region: "EU"
      batch_size: 500
    work_pool:
      name: my-work-pool
You can deploy these configurations using:
Copy code
bash
prefect deploy
2. Ad-hoc Flow Runs (Better for development/testing): For one-off runs or testing, you can run the flow directly with different parameters:
Copy code
python
from prefect import flow

@flow
def my_flow(region: str, batch_size: int):
    print(f"Processing {region} with batch size {batch_size}")

# Run directly with different parameters
my_flow("US", 1000)
my_flow("EU", 500)

# Or create flow runs programmatically
from prefect.client import get_client

async with get_client() as client:
    await client.create_flow_run(
        deployment_id="deployment-id",
        parameters={"region": "US", "batch_size": 1000}
    )
Some key considerations: 1. Each deployment can have its own: - Parameters - Schedule - Work pool configuration - Infrastructure settings 2. Using multiple deployments is beneficial when you need: - Different schedules for different parameter sets - Separate monitoring and observability for each parameter configuration - Different infrastructure requirements based on parameters - Clear separation of concerns in your workflow management 3. The
prefect.yaml
file can use YAML anchors and aliases to avoid repetition when you have many similar deployments. Remember that in Prefect 3.x, deployments are managed through the
prefect.yaml
file or programmatically using
flow.from_source().deploy()
. This is different from Prefect 2.x, which used the now-removed
Deployment.build_from_flow()
method. For more details on deployment configuration options, check out the deployment documentation.
m
@Marvin, is a prefect.yaml file best practice in prefect 3? cause now I use deploy in each file.
m
thought for 63.7 seconds
Both approaches are valid in Prefect 3.x, and your current approach of using in-code deployments is perfectly fine! Here's a breakdown of when to use each: In-Code Deployments (Your current approach):
Copy code
python
@flow
def my_flow():
    pass

if __name__ == "__main__":
    my_flow.deploy(
        name="my-deployment",
        work_pool_name="my-pool",
        parameters={"param1": "value1"}
    )
Advantages: - More Pythonic and familiar for developers - Easier to implement dynamic/programmatic deployment logic - Great for CI/CD pipelines - Keeps deployment configuration close to the flow code - Simpler to get started with - Better for automated deployment management prefect.yaml Approach:
Copy code
yaml
deployments:
  - name: my-deployment
    entrypoint: flows/my_flow.py:my_flow
    work_pool:
      name: my-pool
    parameters:
      param1: value1
Advantages: - Better for managing multiple deployments across a large project - Easier to version control deployment configurations - More suitable for teams that prefer declarative configuration - Clearer overview of all deployments in one place - Better for infrastructure-as-code approaches Your current approach of using in-code deployments is absolutely fine and follows modern Prefect practices. In fact, it's often simpler and more maintainable for many use cases, especially when: - You're working on smaller to medium-sized projects - You want to keep deployment logic close to your flow code - You need programmatic control over deployments - You're implementing CI/CD pipelines The
prefect.yaml
approach becomes more valuable when: - You're managing many deployments across a large project - You need a central place to view all deployment configurations - You're working in a team that prefers infrastructure-as-code - You want to separate deployment configuration from flow logic Bottom line: There's no strict requirement to switch to
prefect.yaml
if your current in-code deployment approach is working well for you. Both methods are supported and considered best practice depending on your specific needs.
m
@Marvin especially considering the ability to run a flow (dbt in my case) with different parameters, the prefect.yaml seems superior? Also give me best practice recommendations with regards to CICD.
m
thought for 57.0 seconds
You're absolutely right! For DBT workflows with varying parameters,
prefect.yaml
does offer superior organization and clarity. Let me show you a comprehensive approach combining
prefect.yaml
with CI/CD best practices. Here's an example
prefect.yaml
that demonstrates parameter management for different DBT environments:
Copy code
yaml
# prefect.yaml
deployments:
  # Use YAML anchors for shared configurations
  - &base-deployment
    work_pool: 
      name: dbt-pool
      work_queue_name: default
    build:
      - prefect.deployments.steps.git_clone:
          repository: <https://github.com/your-org/your-repo.git>
          branch: main
    push: true

  # Development environment
  - <<: *base-deployment
    name: dbt-dev
    entrypoint: flows/dbt_flow.py:dbt_flow
    parameters:
      profiles_dir: "."
      project_dir: "dbt_project"
      select: "tag:daily"  # DBT selector
      target: "dev"
      vars:
        env: "dev"
        days_lookback: 7

  # Production environment
  - <<: *base-deployment
    name: dbt-prod
    entrypoint: flows/dbt_flow.py:dbt_flow
    parameters:
      profiles_dir: "."
      project_dir: "dbt_project"
      select: "tag:daily"
      target: "prod"
      vars:
        env: "prod"
        days_lookback: 30
And here's a corresponding GitHub Actions workflow that demonstrates CI/CD best practices:
Copy code
yaml
# .github/workflows/prefect-cicd.yml
name: Prefect CI/CD

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

jobs:
  deploy-flows:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      
      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.10'
          
      - name: Install dependencies
        run: |
          python -m pip install --upgrade pip
          pip install prefect dbt-core dbt-snowflake  # adjust based on your DBT adapter
          
      - name: Login to Prefect Cloud
        run: |
          prefect cloud login --key ${{ secrets.PREFECT_API_KEY }}
          
      - name: Deploy flows
        run: |
          # Deploy to dev on PR
          if [[ $GITHUB_EVENT_NAME == 'pull_request' ]]; then
            prefect deploy --name dbt-dev
          fi
          
          # Deploy to both dev and prod on main branch push
          if [[ $GITHUB_EVENT_NAME == 'push' && $GITHUB_REF == 'refs/heads/main' ]]; then
            prefect deploy
          fi
Here's the corresponding DBT flow code (
flows/dbt_flow.py
):
Copy code
python
from prefect import flow, task
from prefect_dbt.cli.commands import DbtCoreOperation
from typing import Dict, Optional

@flow
def dbt_flow(
    profiles_dir: str,
    project_dir: str,
    select: str,
    target: str,
    vars: Optional[Dict] = None
):
    dbt_op = DbtCoreOperation(
        commands=["build"],
        project_dir=project_dir,
        profiles_dir=profiles_dir,
        target=target,
        select=select,
        vars=vars
    )
    result = dbt_op.run()
    return result

if __name__ == "__main__":
    # Local testing only - deployment handled by prefect.yaml
    dbt_flow(
        profiles_dir=".",
        project_dir="dbt_project",
        select="tag:daily",
        target="dev",
        vars={"env": "dev", "days_lookback": 7}
    )
Key Best Practices: 1. Environment Separation: - Use YAML anchors (
&base-deployment
) to