Hey, does anyone in the community did incremental etl with Prefect ? We are considering doing it and maybe there are mistakes that can be avoided ^^ Thanks !
04/29/2022, 3:17 PM
hey @Florian Guily, I'm sure there are many community members who have written Prefect flows for some sort of incremental ETL - do you have any specific questions about implementing one with Prefect?
In general, I'd say that how you go about it depends heavily on your source and destination
but as Nate said, always easier to help once we know more about your use case
04/29/2022, 4:15 PM
The goal is to ingest a catalog of external data via an http endpoint. This external catalog is getting updated so we also need to add the new records in our db. When looking to some tools to do this task, i tried Airbyte that had a built in feature to do incremental EL on the datasource. I'm new to this but as far i've understood, the simplest way to do so is to ingest only record that are "older" than a given date cursor and then update the cursor to the current date for next time.
So my first idea was to translate this in a prefecthonic manner but i'm completely open to suggestion if peoples have some experience about it 🙂
04/29/2022, 4:25 PM
In prefect cloud, the KV Store is great for things like storing highwater marks. Basically get the current max ID (or whatever) from there when the flow starts, use it in the query, and update it at the end. I haven't run into troubles with it so far. Of course, the highwater mark should be strictly increasing, it can get tricky with stuff like timestamps.
04/29/2022, 4:26 PM
cool - so one idea I have given that description is to use the Key Value store (if you're on Cloud) to maintain the state of that cursor between flow runs.
@Henning Holgersen beat me to it
04/29/2022, 4:31 PM
All great answers, just to add one more option: parametrized flows! You can check this post showing, a.o., how you may leverage parameters for such backfilling tasks
05/02/2022, 8:00 AM
Ok thanks for all of your response ! Super reactiv as always !