https://prefect.io logo
Title
f

Florian Guily

04/29/2022, 2:58 PM
Hey, does anyone in the community did incremental etl with Prefect ? We are considering doing it and maybe there are mistakes that can be avoided ^^ Thanks !
n

Nate

04/29/2022, 3:17 PM
hey @Florian Guily, I'm sure there are many community members who have written Prefect flows for some sort of incremental ETL - do you have any specific questions about implementing one with Prefect? In general, I'd say that how you go about it depends heavily on your source and destination
:upvote: 1
but as Nate said, always easier to help once we know more about your use case
f

Florian Guily

04/29/2022, 4:15 PM
The goal is to ingest a catalog of external data via an http endpoint. This external catalog is getting updated so we also need to add the new records in our db. When looking to some tools to do this task, i tried Airbyte that had a built in feature to do incremental EL on the datasource. I'm new to this but as far i've understood, the simplest way to do so is to ingest only record that are "older" than a given date cursor and then update the cursor to the current date for next time.
So my first idea was to translate this in a prefecthonic manner but i'm completely open to suggestion if peoples have some experience about it 🙂
h

Henning Holgersen

04/29/2022, 4:25 PM
In prefect cloud, the KV Store is great for things like storing highwater marks. Basically get the current max ID (or whatever) from there when the flow starts, use it in the query, and update it at the end. I haven't run into troubles with it so far. Of course, the highwater mark should be strictly increasing, it can get tricky with stuff like timestamps.
💯 1
:upvote: 2
n

Nate

04/29/2022, 4:26 PM
cool - so one idea I have given that description is to use the Key Value store (if you're on Cloud) to maintain the state of that cursor between flow runs.
@Henning Holgersen beat me to it
a

Anna Geller

04/29/2022, 4:31 PM
All great answers, just to add one more option: parametrized flows! You can check this post showing, a.o., how you may leverage parameters for such backfilling tasks
f

Florian Guily

05/02/2022, 8:00 AM
Ok thanks for all of your response ! Super reactiv as always !
👍 1