https://prefect.io logo
Title
d

Daniel Sääf

05/20/2022, 5:48 AM
Hi. Im creating my first flow - which is an daily ETL flow that reads data from csv-files and writes the data to big query. But now i wonder if there are any recommended ways to safeguard that duplicates aren’t written to big query if the flow it’s executed twice. I was thinking of using the cached_key_fn to not rerun the write task but feel unsure if that’s how it’s supposed to do. (I would rather have the task to be skipped..)
1
a

Anna Geller

05/20/2022, 10:58 AM
this is usually something you tackle on the SQL side - something like this https://cloud.google.com/bigquery/docs/reference/standard-sql/dml-syntax#merge_statement
k

Kevin Kho

05/20/2022, 2:12 PM
You can use the KV Store to keep track of already processed records
d

Daniel Sääf

05/20/2022, 2:30 PM
Thanks for great advices!
Must say that this is one of the best community forums i’ve experienced!
🙏 1