What's the best way to incrementally load tables (e.x. a loop) but then also ensure that if a connection is lost, the task retry will re-establish the connection AND not create duplicates on previously loaded rows?
k
Kevin Kho
01/25/2022, 5:00 PM
You can persist a watermark in the KV store to keep track of the last processed date then update it after an operation
Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.