Hey guys I ve a question for all the dbt data model mastermi Prefect Community #data-tricks-and-tips

Hey guys, I've a question for all the dbt data mod...

Andreas Nigg

10/28/2022, 1:20 PM

Hey guys, I've a question for all the dbt data model masterminds out there (I hope this is the right channel). I’ve a rather huge source table which is partitioned by a column “loaded_at”. I have an incremental model which reads from this source table. To exclude source table partitions to read, I could make use of the _dbt_max_partitions scripting variable - something like below

Copy code

{% if is_incremental() %}
    where loaded_at>= coalesce(_dbt_max_partition, '2022-01-01')
  {% else %}

But, the problem is, that in my incremental model I do not partition by “loaded_at” but by a different column (due to use-case demands). So _dbt_max_partition would not help here, as it would simply return the maximum partition value of the model (which I can’t use as filter for the source table). In “native” BigQuery I would simply use a scripting variable as follows

Copy code

declare max_source_partition timestamp;
set max_source_partition = (select max(loaded_at) as ts from `my_model_table`);
select * from `my_source_table` where loaded_at > max_source_partition

How can one implement such a scenario with dbt? Is there a way to create scripting variables as part of my models? Or do I need to add it as a on-start-hook? Or any better strategies to exclude partitions in my source without having the same column as partition field in my model?

Kalise Richmond

10/28/2022, 9:40 PM

Hey @Andreas Nigg, you should ask in our #C0470L5UV2M channel

blob attention gif 1

5 Views

Open in Slack

Previous Next