<@ULVA73B9P> I am hosting a Prefect 3 server. What...
# ask-marvin
a
@Marvin I am hosting a Prefect 3 server. What is the best way to run ingest of a source immediately followed by a dbt transformation afterwards? Separate definitions in Python? Two separate deployments trigger using
run_deployment
? Any other options?
m
thought for 87.2 seconds
For your ETL scenario with ingestion followed by dbt transformation, you have several options, each with its own advantages. Here are the recommended approaches, from simplest to most sophisticated: 1. Single Flow with Multiple Tasks
Copy code
python
from prefect import flow, task

@task
def ingest_data():
    # Your ingestion logic here
    pass

@task
def run_dbt():
    # Your dbt transformation logic here
    pass

@flow
def etl_pipeline():
    data = ingest_data()
    run_dbt()
2. Parent-Child Flow Pattern
Copy code
python
from prefect import flow

@flow
def ingest_flow():
    # Your ingestion logic here
    pass

@flow
def dbt_flow():
    # Your dbt transformation logic here
    pass

@flow
def parent_flow():
    ingest_result = ingest_flow()
    dbt_flow()
3. Separate Deployments with
run_deployment
Copy code
python
from prefect import flow
from prefect.deployments import run_deployment

@flow
def orchestrator():
    # Run ingest deployment
    ingest_result = run_deployment(
        name="ingest/default",
        timeout=300,
        wait=True  # Important: Wait for completion before proceeding
    )
    
    # Only run dbt if ingestion succeeded
    if ingest_result.state.is_completed():
        dbt_result = run_deployment(
            name="dbt-transform/default",
            timeout=600,
            wait=True
        )
Recommendations: 1. For Simple Workflows: Use the single flow with multiple tasks approach. It's the most straightforward and provides built-in dependency management. 2. For Reusable Components: Use the parent-child flow pattern when you want to: - Reuse the individual flows independently - Have separate logging for each major step - Maintain cleaner separation of concerns 3. For Complex Orchestration: Use
run_deployment
when you need: - Dynamic deployment selection - Different scheduling patterns for each component - Ability to trigger the components independently - Different infrastructure/work pools for each component Additional Considerations: - If using dbt, consider using the prefect-dbt integration package for better dbt integration - For production environments, ensure proper error handling and notifications - Consider using
wait=True
with
run_deployment
to ensure sequential execution - Use flow parameters to pass data between flows if needed The choice largely depends on: - How often you need to run each component independently - Whether the components need different infrastructure/resources - How tightly coupled the steps are - Your monitoring and observability requirements For your specific case, since you're running on a hosted Prefect server, I'd recommend starting with the parent-child flow pattern as it provides a good balance of separation and simplicity while maintaining clear dependency relationships.