Florian Guily

    Florian Guily

    4 months ago
    Hey, does anyone in the community did incremental etl with Prefect ? We are considering doing it and maybe there are mistakes that can be avoided ^^ Thanks !
    Nate

    Nate

    4 months ago
    hey @Florian Guily, I'm sure there are many community members who have written Prefect flows for some sort of incremental ETL - do you have any specific questions about implementing one with Prefect? In general, I'd say that how you go about it depends heavily on your source and destination
    but as Nate said, always easier to help once we know more about your use case
    Florian Guily

    Florian Guily

    4 months ago
    The goal is to ingest a catalog of external data via an http endpoint. This external catalog is getting updated so we also need to add the new records in our db. When looking to some tools to do this task, i tried Airbyte that had a built in feature to do incremental EL on the datasource. I'm new to this but as far i've understood, the simplest way to do so is to ingest only record that are "older" than a given date cursor and then update the cursor to the current date for next time.
    So my first idea was to translate this in a prefecthonic manner but i'm completely open to suggestion if peoples have some experience about it 🙂
    Henning Holgersen

    Henning Holgersen

    4 months ago
    In prefect cloud, the KV Store is great for things like storing highwater marks. Basically get the current max ID (or whatever) from there when the flow starts, use it in the query, and update it at the end. I haven't run into troubles with it so far. Of course, the highwater mark should be strictly increasing, it can get tricky with stuff like timestamps.
    Nate

    Nate

    4 months ago
    cool - so one idea I have given that description is to use the Key Value store (if you're on Cloud) to maintain the state of that cursor between flow runs.
    @Henning Holgersen beat me to it
    Anna Geller

    Anna Geller

    4 months ago
    All great answers, just to add one more option: parametrized flows! You can check this post showing, a.o., how you may leverage parameters for such backfilling tasks
    Florian Guily

    Florian Guily

    4 months ago
    Ok thanks for all of your response ! Super reactiv as always !