Hi folks!
We want to run an incremental flow in Prefect. I.e., we move some data from Snowflake to a AWS SageMaker feature store. We have this part working.
However, what we want to do is run the flow on a regular schedule, and each time, only move the rows in Snowflake that are not already moved to AWS from the previous run. Is there a good way to schedule incremental runs like this in Prefect itself?
If someone knows of an example to point us to of how this is done, we'd be grateful 🙂
k
Kevin Kho
12/30/2021, 5:21 PM
For this type of setup in general, you need to persist the “last processed time” somewhere. When the Flow runs, it looks at the last processed time, creates a query that fetches unprocessed data, and then moves it, and then updates that last processed time. This is one of the reasons we released the KV Store
Kevin Kho
12/30/2021, 5:21 PM
You can use the KV store to fetch the last processed time at the start of your Flow and then update it at the end of your Flow. If the Flow fails, then it won’t get updated and the next run will still pull unprocessed data
Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.