Hi.
Im creating my first flow - which is an daily ETL flow that reads data from csv-files and writes the data to big query.
But now i wonder if there are any recommended ways to safeguard that duplicates aren’t written to big query if the flow it’s executed twice.
I was thinking of using the cached_key_fn to not rerun the write task but feel unsure if that’s how it’s supposed to do. (I would rather have the task to be skipped..)
Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.