Jeffery Newburn

    Jeffery Newburn

    1 year ago
    We are running a typical ETL transform on a dataset in Prefect that does not fit in memory. We are looking at the best way to do multiple transformations on the data. Currently, we have 1 task that: • Queries the database • Transforms the data one record at a time • Saves each transformed record to a file Does Prefect have a good way to do this in multiple tasks without overrunning memory? Like Task A(Read data)->Task B(first transform)->Task C(second transformation)->Task D(write data)?
    Samuel Hinton

    Samuel Hinton

    1 year ago
    We have a similar use case, and our prefect tasks use an external dask executor so that we have a fixed number of tasks running in parallel at any one time. Each task grabs a portion of the data, processes it, and saves it out. You can see our schematic below and it seems to work pretty well 🙂 The tasks are getting parameters, sanitising them and then a collection of getdata/process/save
    Jeffery Newburn

    Jeffery Newburn

    1 year ago
    Oh that is very helpful thank you