Hi community, how is the best practice for implementing parallel dbt jobs using Prefect 1.0? Basically, I need to send different client_id to dbt jobs and trigger dbt run for each client_id (pseudo code as following)
Copy code
with Flow as flow:
for i in id_list:
dbt_run_function(i)
(I’m wondering if a simple for loop would achieve parallelism?)
If we dont have a DaskExecutor, will mapping be achieved? We are using snowflake for data warehousing
Chu
07/19/2022, 7:52 PM
I’m wondering if the mapping is the same as for loop? (no reduce needed)
k
Kevin Kho
07/19/2022, 9:22 PM
Mapping will just run sequentially if you have no parallelism
Kevin Kho
07/19/2022, 9:22 PM
But yes it is the DAG equivalent of the for loop
c
Chu
07/19/2022, 10:14 PM
To ensure parallelism, i need to add daskexecutor for example right?
k
Kevin Kho
07/19/2022, 10:16 PM
or LocalDask, yep
c
Chu
07/19/2022, 10:24 PM
Thank you Kevin! I haven't use dask executor before, will that cause some fees or require some computing infra? Or i can just import it in python and it will function
k
Kevin Kho
07/19/2022, 11:01 PM
No fees because we bill per task anyway. You can just use it
Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.