< Marvin> I m running a dbt flow with Prefect 3 0 using `Dbt Prefect Community #marvin-ai

<@ULVA73B9P> I'm running a dbt flow with Prefect 3...

Medwin David

07/23/2025, 4:28 PM

@Marvin I'm running a dbt flow with Prefect 3.0 using

DbtCoreOperation

inside a managed work pool. Everything starts fine, targeting dev , but after ~17 minutes, the flow randomly starts looking for models in the
RAW
database instead. This happens inconsistently across models — e.g.,

mrr

lead

, etc. I'm using a

profiles.yml

available in my dbt project.

Marvin

07/23/2025, 4:28 PM

✅ thought for 40.0 seconds

Marvin

07/23/2025, 4:28 PM

Based on your description and my research, I can identify a few potential issues and solutions: 1. Environment Consistency in Managed Work Pools The behavior you're describing where the target database suddenly changes to

RAW

after ~17 minutes suggests there might be an issue with environment persistence in the managed work pool. This could be caused by: - Session timeouts - Environment variable changes - Profile configuration not being properly persisted 2. Suggested Solutions: a) Explicit Profile Configuration Instead of relying solely on the

profiles.yml

file, try explicitly configuring your target using

DbtCliProfile

and

SnowflakeTargetConfigs

(or equivalent for your database):

Copy code

python
from prefect_dbt.cli.credentials import DbtCliProfile
from prefect_dbt.cli.configs import SnowflakeTargetConfigs  # or your DB equivalent

# Create explicit target configuration
target_configs = SnowflakeTargetConfigs(
    schema="your_schema",
    database="your_database",
    # other necessary configs
)

dbt_cli_profile = DbtCliProfile(
    name="your_project",
    target="dev",
    target_configs=target_configs
)

# Use in your DbtCoreOperation
dbt_op = DbtCoreOperation(
    commands=["run"],
    profile=dbt_cli_profile,
    project_dir="path/to/your/project"
)

b) Profile Path Verification Make sure your

profiles.yml

is in a location that's accessible to the managed work pool:

Copy code

python
dbt_op = DbtCoreOperation(
    commands=["run"],
    profiles_dir="/absolute/path/to/profiles/directory",  # Be explicit
    project_dir="path/to/your/project"
)

3. Additional Debugging Steps: - Add a

dbt debug

command before your main dbt operations to verify configuration:

Copy code

python
debug_op = DbtCoreOperation(
    commands=["debug"],
    profiles_dir=profiles_dir,
    project_dir=project_dir
)

- Log the current working directory and environment variables at the start of your flow 4. Request for More Information: To help diagnose this better, could you please provide: - The exact error messages you're seeing when it tries to access the RAW database - Your flow code structure - Are you using any database-specific blocks (like SnowflakeBlock, RedshiftBlock, etc.)? - The content of your

profiles.yml

(with sensitive information redacted) This seems like it could be related to how credentials and configurations are being maintained in the managed work pool environment. The 17-minute mark suggests it might be related to session management or credential expiration.

Medwin David

07/23/2025, 4:59 PM

@Marvin Error: 002003 (02000): SQL compilation error: Schema 'RAW.INT_MTING' does not exist or not authorized. Here I dont want to use RAW, i donthave any RAW db mentioned in my profiles.yml

Marvin

07/23/2025, 5:00 PM

✅ thought for 40.4 seconds

Marvin

07/23/2025, 5:00 PM

This behavior where dbt suddenly tries to access a

RAW

schema that isn't defined in your

profiles.yml

suggests a few potential issues: 1. Model References The most likely cause is that one of your dbt models might have a reference to the RAW schema, either: - In a

ref()

source()

function - In raw SQL that's using fully qualified names - In a schema override in your

dbt_project.yml

2. Custom Macros If you're using any custom macros, they might be generating schema names dynamically and defaulting to RAW in some conditions. To debug this, I recommend: 1. Add Verbose Logging

Copy code

python
from prefect_dbt import DbtCoreOperation

dbt_op = DbtCoreOperation(
    commands=["--debug", "run"],  # Add --debug flag
    project_dir="your/project/dir",
    profiles_dir="path/to/profiles"
)

2. Check Model Compilation Run a compilation check first to see the generated SQL:

Copy code

python
compile_op = DbtCoreOperation(
    commands=["compile"],
    project_dir="your/project/dir",
    profiles_dir="path/to/profiles"
)

3. Could you share (with sensitive info removed): - The content of your

dbt_project.yml

- The model code for

mrr

and

lead

models you mentioned - Any custom macros you might be using 4. Temporary Debugging Step You can also try running with

--models

flag to isolate which model triggers this:

Copy code

python
dbt_op = DbtCoreOperation(
    commands=["run", "--models", "mrr"],  # Test just one model
    project_dir="your/project/dir",
    profiles_dir="path/to/profiles"
)

The fact that this happens after ~17 minutes and inconsistently across models suggests it might be related to: - Session timeouts - Connection pool issues - Resource constraints in the managed work pool Could you also confirm: 1. Are you using Snowflake (based on the error format)? 2. Does this happen with the same models each time, or is it random? 3. Do you see any pattern in which models trigger this behavior? This will help us narrow down whether it's a configuration issue, a model issue, or an infrastructure issue.

Medwin David

07/23/2025, 5:04 PM

@Marvin 1. this is snowflake based 2. this happen with mostly with 3 or 4 models but not all the time,sometimes it happens with mmr, sometimes with lead, other time iot could be another . 3. no

Marvin

07/23/2025, 5:04 PM

✅ thought for 20.8 seconds

Marvin

07/23/2025, 5:05 PM

Based on the symptoms you've described: - Snowflake backend - Random models affected - Occurs after extended runtime (~17 minutes) - Incorrectly tries to access RAW schema - Happens in managed work pools This strongly suggests a Snowflake session timeout or connection reset issue. Here are specific solutions to try: 1. Adjust Snowflake Session Parameters In your

profiles.yml

, add these session parameters:

Copy code

yaml
your_profile_name:
  outputs:
    dev:
      type: snowflake
      # ... your other configs ...
      session_parameters:
        STATEMENT_TIMEOUT_IN_SECONDS: 3600  # 1 hour
        CLIENT_SESSION_KEEP_ALIVE: True

2. Use Connection Retry Logic Modify your dbt operation to include retry configuration:

Copy code

python
from prefect_dbt import DbtCoreOperation
from prefect import flow, task
from datetime import timedelta

@flow(retries=2, retry_delay_seconds=30)
def dbt_flow():
    dbt_op = DbtCoreOperation(
        commands=["run"],
        project_dir="your/project/dir",
        profiles_dir="path/to/profiles",
        # Add these Snowflake-specific configs
        dbt_cli_profile_args={
            "threads": 4,  # Adjust based on your needs
            "client_session_keep_alive": True
        }
    )
    return dbt_op.run()

3. Split Your Models If the issue persists, consider breaking up your dbt run into smaller chunks:

Copy code

python
from prefect import flow

@flow
def dbt_flow():
    # First batch
    DbtCoreOperation(
        commands=["run", "--models", "tag:batch1"],
        project_dir="your/project/dir",
        profiles_dir="path/to/profiles"
    ).run()
    
    # Second batch
    DbtCoreOperation(
        commands=["run", "--models", "tag:batch2"],
        project_dir="your/project/dir",
        profiles_dir="path/to/profiles"
    ).run()

4. Add Connection Health Check Add a pre-run check to verify connection status:

Copy code

python
from prefect import flow
from prefect_dbt import DbtCoreOperation

@flow
def dbt_flow():
    # Check connection first
    connection_check = DbtCoreOperation(
        commands=["debug"],
        project_dir="your/project/dir",
        profiles_dir="path/to/profiles"
    ).run()
    
    if connection_check.is_failed():
        raise Exception("DBT connection check failed")
        
    # Proceed with main operation
    return DbtCoreOperation(
        commands=["run"],
        project_dir="your/project/dir",
        profiles_dir="path/to/profiles"
    ).run()

5. Monitor and Log Add logging to track when these switches occur:

Copy code

python
from prefect import flow, get_run_logger

@flow
def dbt_flow():
    logger = get_run_logger()
    
    try:
        result = DbtCoreOperation(
            commands=["run", "--debug"],  # Add debug flag
            project_dir="your/project/dir",
            profiles_dir="path/to/profiles"
        ).run()
        <http://logger.info|logger.info>(f"DBT operation completed: {result}")
    except Exception as e:
        logger.error(f"DBT operation failed: {str(e)}")
        raise

Would you like me to provide more details about any of these solutions? Also, it would be helpful to know: 1. What's the typical total runtime of your dbt flow? 2. Are you using any custom target configurations in your

dbt_project.yml

? 3. Have you noticed if this happens more frequently during peak usage times? This information would help us further refine the solution to your specific case.

3 Views

Open in Slack

Previous Next