<@ULVA73B9P> I'm running a dbt flow with Prefect 3...
# marvin-ai
m
@Marvin I'm running a dbt flow with Prefect 3.0 using
DbtCoreOperation
inside a managed work pool. Everything starts fine, targeting dev , but after ~17 minutes, the flow randomly starts looking for models in the
RAW
database
instead. This happens inconsistently across models — e.g.,
mrr
,
lead
, etc. I'm using a
profiles.yml
available in my dbt project.
m
thought for 40.0 seconds
Based on your description and my research, I can identify a few potential issues and solutions: 1. Environment Consistency in Managed Work Pools The behavior you're describing where the target database suddenly changes to
RAW
after ~17 minutes suggests there might be an issue with environment persistence in the managed work pool. This could be caused by: - Session timeouts - Environment variable changes - Profile configuration not being properly persisted 2. Suggested Solutions: a) Explicit Profile Configuration Instead of relying solely on the
profiles.yml
file, try explicitly configuring your target using
DbtCliProfile
and
SnowflakeTargetConfigs
(or equivalent for your database):
Copy code
python
from prefect_dbt.cli.credentials import DbtCliProfile
from prefect_dbt.cli.configs import SnowflakeTargetConfigs  # or your DB equivalent

# Create explicit target configuration
target_configs = SnowflakeTargetConfigs(
    schema="your_schema",
    database="your_database",
    # other necessary configs
)

dbt_cli_profile = DbtCliProfile(
    name="your_project",
    target="dev",
    target_configs=target_configs
)

# Use in your DbtCoreOperation
dbt_op = DbtCoreOperation(
    commands=["run"],
    profile=dbt_cli_profile,
    project_dir="path/to/your/project"
)
b) Profile Path Verification Make sure your
profiles.yml
is in a location that's accessible to the managed work pool:
Copy code
python
dbt_op = DbtCoreOperation(
    commands=["run"],
    profiles_dir="/absolute/path/to/profiles/directory",  # Be explicit
    project_dir="path/to/your/project"
)
3. Additional Debugging Steps: - Add a
dbt debug
command before your main dbt operations to verify configuration:
Copy code
python
debug_op = DbtCoreOperation(
    commands=["debug"],
    profiles_dir=profiles_dir,
    project_dir=project_dir
)
- Log the current working directory and environment variables at the start of your flow 4. Request for More Information: To help diagnose this better, could you please provide: - The exact error messages you're seeing when it tries to access the RAW database - Your flow code structure - Are you using any database-specific blocks (like SnowflakeBlock, RedshiftBlock, etc.)? - The content of your
profiles.yml
(with sensitive information redacted) This seems like it could be related to how credentials and configurations are being maintained in the managed work pool environment. The 17-minute mark suggests it might be related to session management or credential expiration.
m
@Marvin Error: 002003 (02000): SQL compilation error: Schema 'RAW.INT_MTING' does not exist or not authorized. Here I dont want to use RAW, i donthave any RAW db mentioned in my profiles.yml
m
thought for 40.4 seconds
This behavior where dbt suddenly tries to access a
RAW
schema that isn't defined in your
profiles.yml
suggests a few potential issues: 1. Model References The most likely cause is that one of your dbt models might have a reference to the RAW schema, either: - In a
ref()
or
source()
function - In raw SQL that's using fully qualified names - In a schema override in your
dbt_project.yml
2. Custom Macros If you're using any custom macros, they might be generating schema names dynamically and defaulting to RAW in some conditions. To debug this, I recommend: 1. Add Verbose Logging
Copy code
python
from prefect_dbt import DbtCoreOperation

dbt_op = DbtCoreOperation(
    commands=["--debug", "run"],  # Add --debug flag
    project_dir="your/project/dir",
    profiles_dir="path/to/profiles"
)
2. Check Model Compilation Run a compilation check first to see the generated SQL:
Copy code
python
compile_op = DbtCoreOperation(
    commands=["compile"],
    project_dir="your/project/dir",
    profiles_dir="path/to/profiles"
)
3. Could you share (with sensitive info removed): - The content of your
dbt_project.yml
- The model code for
mrr
and
lead
models you mentioned - Any custom macros you might be using 4. Temporary Debugging Step You can also try running with
--models
flag to isolate which model triggers this:
Copy code
python
dbt_op = DbtCoreOperation(
    commands=["run", "--models", "mrr"],  # Test just one model
    project_dir="your/project/dir",
    profiles_dir="path/to/profiles"
)
The fact that this happens after ~17 minutes and inconsistently across models suggests it might be related to: - Session timeouts - Connection pool issues - Resource constraints in the managed work pool Could you also confirm: 1. Are you using Snowflake (based on the error format)? 2. Does this happen with the same models each time, or is it random? 3. Do you see any pattern in which models trigger this behavior? This will help us narrow down whether it's a configuration issue, a model issue, or an infrastructure issue.
m
@Marvin 1. this is snowflake based 2. this happen with mostly with 3 or 4 models but not all the time,sometimes it happens with mmr, sometimes with lead, other time iot could be another . 3. no
m
thought for 20.8 seconds
Based on the symptoms you've described: - Snowflake backend - Random models affected - Occurs after extended runtime (~17 minutes) - Incorrectly tries to access RAW schema - Happens in managed work pools This strongly suggests a Snowflake session timeout or connection reset issue. Here are specific solutions to try: 1. Adjust Snowflake Session Parameters In your
profiles.yml
, add these session parameters:
Copy code
yaml
your_profile_name:
  outputs:
    dev:
      type: snowflake
      # ... your other configs ...
      session_parameters:
        STATEMENT_TIMEOUT_IN_SECONDS: 3600  # 1 hour
        CLIENT_SESSION_KEEP_ALIVE: True
2. Use Connection Retry Logic Modify your dbt operation to include retry configuration:
Copy code
python
from prefect_dbt import DbtCoreOperation
from prefect import flow, task
from datetime import timedelta

@flow(retries=2, retry_delay_seconds=30)
def dbt_flow():
    dbt_op = DbtCoreOperation(
        commands=["run"],
        project_dir="your/project/dir",
        profiles_dir="path/to/profiles",
        # Add these Snowflake-specific configs
        dbt_cli_profile_args={
            "threads": 4,  # Adjust based on your needs
            "client_session_keep_alive": True
        }
    )
    return dbt_op.run()
3. Split Your Models If the issue persists, consider breaking up your dbt run into smaller chunks:
Copy code
python
from prefect import flow

@flow
def dbt_flow():
    # First batch
    DbtCoreOperation(
        commands=["run", "--models", "tag:batch1"],
        project_dir="your/project/dir",
        profiles_dir="path/to/profiles"
    ).run()
    
    # Second batch
    DbtCoreOperation(
        commands=["run", "--models", "tag:batch2"],
        project_dir="your/project/dir",
        profiles_dir="path/to/profiles"
    ).run()
4. Add Connection Health Check Add a pre-run check to verify connection status:
Copy code
python
from prefect import flow
from prefect_dbt import DbtCoreOperation

@flow
def dbt_flow():
    # Check connection first
    connection_check = DbtCoreOperation(
        commands=["debug"],
        project_dir="your/project/dir",
        profiles_dir="path/to/profiles"
    ).run()
    
    if connection_check.is_failed():
        raise Exception("DBT connection check failed")
        
    # Proceed with main operation
    return DbtCoreOperation(
        commands=["run"],
        project_dir="your/project/dir",
        profiles_dir="path/to/profiles"
    ).run()
5. Monitor and Log Add logging to track when these switches occur:
Copy code
python
from prefect import flow, get_run_logger

@flow
def dbt_flow():
    logger = get_run_logger()
    
    try:
        result = DbtCoreOperation(
            commands=["run", "--debug"],  # Add debug flag
            project_dir="your/project/dir",
            profiles_dir="path/to/profiles"
        ).run()
        <http://logger.info|logger.info>(f"DBT operation completed: {result}")
    except Exception as e:
        logger.error(f"DBT operation failed: {str(e)}")
        raise
Would you like me to provide more details about any of these solutions? Also, it would be helpful to know: 1. What's the typical total runtime of your dbt flow? 2. Are you using any custom target configurations in your
dbt_project.yml
? 3. Have you noticed if this happens more frequently during peak usage times? This information would help us further refine the solution to your specific case.