I want to run the same DbtShellTask on two different schemas Prefect Community #ask-community

I want to run the same DbtShellTask on two differe...

Apoorva Desai

06/28/2022, 8:59 PM

I want to run the same DbtShellTask on two different schemas in one flow. Is there an efficient way to do this? I am currently doing this and it feels chunky and repetitive:

code snippet moved to thread

The two DbtShellTasks are identical except for the schema. I tried defining the DbtShellTask without schema outside of my flow and tried a few different things inside my flow:

transform_task.run(dbt_kwargs = {"schema": "schema_1"})

but I always get the same error:

Copy code

ValueError: Could not infer an active Flow context while creating edge to <Task: DbtShellTask>. This often means you called a task outside a `with Flow(...)` block. If you're trying to run this task outside of a Flow context, you need to call `DbtShellTask(...).run(...)`

I can tell I am doing something silly but can't seem to resolve it 😅

Wei Mei

06/28/2022, 9:18 PM

Hi! I totally don’t know the answer but I am in the same boat as you! I am trying to use a different command=“” for different clocks. Maybe this might work for you?

Wei Mei

06/28/2022, 9:19 PM

Copy code

dbt = DbtShellTask(
    return_all=True,
    profile_name="snowflake",
    environment="prod", #The env, change to dev if you are trying to do it locally
    profiles_dir=Secret("DBT_PROFILE_DIR_T2").get(), #Change to the local dir where your profiles.yml is located
    overwrite_profiles=False,
    log_stdout=True,
    helper_script="cd dbt",
    log_stderr=True,
)

@task(name="DBT run", nout=2)
def dbt_run(clock, dbt_command):
    if clock == "full_run" and dbt_command == "orders":
        <http://logger.info|logger.info>("DBT run all")
        run = dbt.run(
            command=f"dbt run",
        )
    else:
        <http://logger.info|logger.info>("DBT run select")
        run = dbt.run(
            command=f"dbt run -s {dbt_command}",
        )
    
    return run, dbt_command

@task(name="DBT test")
def dbt_test(clock, dbt_command):
    if clock == "full_run" and dbt_command == "orders":
        <http://logger.info|logger.info>("DBT test all")
        run = dbt.run(
            command=f"dbt test",
        )
    else:
        <http://logger.info|logger.info>("DBT test select")
        run = dbt.run(
            command=f"dbt test -s {dbt_command}",
        )
    
    return run

Wei Mei

06/28/2022, 9:19 PM

In my flow: this is how I am calling the tasks.

Copy code

run = dbt_run(clock=clock, dbt_command=dbt_command)
test = dbt_test(clock=clock, dbt_command=dbt_command)

Apoorva Desai

06/28/2022, 9:37 PM

oooh, thank you! I'll try something similar and see if it works for me.

Kevin Kho

06/28/2022, 9:56 PM

Hey Apoorva, could you move the first code snippet to the thread when you get the chance to keep the main channel a bit more compact?

✅ 1

Apoorva Desai

06/28/2022, 9:57 PM

Sure:

Copy code

transform_task_schema_1 = DbtShellTask(
            profile_name="default",
            environment="prod",
            dbt_kwargs={
                "type": "snowflake",
                "threads": 10,
                "account": "xxx",
                "role": "xxx",
                "database": "xxx",
                "warehouse": "xxx",
                "schema": "schema_1",
                "user": "xxx",
                "password": xxx,
            },
            overwrite_profiles=True,
            profiles_dir="/root/.dbt/",
            helper_script=('cd /root/repo'),
            return_all=True,
            log_stderr=True
        )(command="dbt seed")

    transform_task_schema_1 = DbtShellTask(
            profile_name="default",
            environment="prod",
            dbt_kwargs={
                "type": "snowflake",
                "threads": 10,
                "account": "xxx",
                "role": "xxx",
                "database": "xxx",
                "warehouse": "xxx",
                "schema": "schema_2",
                "user": "xxx",
                "password": xxx,
            },
            overwrite_profiles=True,
            profiles_dir="/root/.dbt/",
            helper_script=('cd /root/repo'),
            return_all=True,
            log_stderr=True
        )(command="dbt seed")

Apoorva Desai

06/28/2022, 9:57 PM

@Kevin Kho Wei Mei's solution didn't really work for me

Apoorva Desai

06/28/2022, 9:58 PM

Now I am trying this inside `with Flow`:

Copy code

transform_task = DbtShellTask(
            profile_name="default",
            environment="prod",
            dbt_kwargs={
                "type": "snowflake",
                "threads": 10,
                "account": "xxx",
                "role": "xxx",
                "database": "xxx",
                "warehouse": "xxx",
                "schema": "schema_1",
                "user": "xxx",
                "password": xxx,
            },
            overwrite_profiles=True,
            profiles_dir="/root/.dbt/",
            helper_script=('cd /root/repo'),
            return_all=True,
            log_stderr=True
        )
    transform_task_1=transform_task.run(dbt_kwargs = {"schema": "schema_1"}, command="dbt seed")
    transform_task_2=transform_task.run(dbt_kwargs = {"schema": "schema_2"}, command="dbt seed")

and I get a new error:

PermissionError: [Errno 13] Permission denied: '/root/.dbt/profiles.yml'

Apoorva Desai

06/28/2022, 9:59 PM

@Kevin Kho Is it even possible to do what I am trying to do?

Kevin Kho

06/28/2022, 10:00 PM

I would do:

Copy code

transform_task_schema_1 = DbtShellTask(
            profile_name="default",
            environment="prod",
            dbt_kwargs={
                "type": "snowflake",
                "threads": 10,
                "account": "xxx",
                "role": "xxx",
                "database": "xxx",
                "warehouse": "xxx",
                "schema": "schema_1",
                "user": "xxx",
                "password": xxx,
            },
            overwrite_profiles=True,
            profiles_dir="/root/.dbt/",
            helper_script=('cd /root/repo'),
            return_all=True,
            log_stderr=True
        )
transform_task_schema_2 = copy.deepcopy(transform_task_scheme_1)
transform_task_schema_2.dbt_kwargs = {new_schema_here}

with Flow() as flow:
    transform_task_1 = transform_task_schema_1(command="dbt seed")
    transform_task_2 = transform_task_schema_2(command="dbt seed")

Kevin Kho

06/28/2022, 10:00 PM

I think that would work. Like make a copy and edit the class itself.

Apoorva Desai

06/28/2022, 10:01 PM

ah okay, let me try that. brb

Apoorva Desai

06/28/2022, 10:35 PM

The second dbtshelltask fails with:

Copy code

Task 'DbtShellTask': Exception encountered during task execution!
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/prefect/engine/task_runner.py", line 880, in get_task_run_state
    value = prefect.utilities.executors.run_task_with_timeout(
  File "/usr/local/lib/python3.8/site-packages/prefect/utilities/executors.py", line 468, in run_task_with_timeout
    return task.run(*args, **kwargs)  # type: ignore
  File "/usr/local/lib/python3.8/site-packages/prefect/utilities/tasks.py", line 456, in method
    return run_method(self, *args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/prefect/tasks/dbt/dbt.py", line 164, in run
    dbt_kwargs = {**self.dbt_kwargs, **(dbt_kwargs or {})}
TypeError: 'set' object is not a mapping

Apoorva Desai

06/28/2022, 10:42 PM

transform_task_schema_2.dbt_kwargs = {new_schema_here}

needs to be

transform_task_schema_2.dbt_kwargs = {"schema" : new_schema_here}

I think. After doing that:

Copy code

22:40:32  Encountered an error while reading profiles:
22:40:32    ERROR: Runtime Error
  required field "type" not found in profile default and target prod
22:40:32  Defined profiles:
22:40:32   - default
22:40:32
For more information on configuring profiles, please consult the dbt docs:

<https://docs.getdbt.com/docs/configure-your-profile>

22:40:32  Encountered an error:
Runtime Error
  Could not run dbt

@Kevin Kho

Kevin Kho

06/28/2022, 10:51 PM

I think you need to pass the entire dbt kwargs dict. Not just the schema

Apoorva Desai

06/28/2022, 10:59 PM

That worked

Apoorva Desai

06/28/2022, 10:59 PM

thanks

Open in Slack

Previous Next