Alex Rud

    Alex Rud

    1 year ago
    Hi… Is there a way to map over a dataframe without producing some intermediary objects that bloat memory?
    I have an ETL thats something like:
    read file to dataframe
    ->
    iterate over data frame (non map) to produce a list of structs
    ->
    map over structs to POST to consumer
    this goes from 150MB file -> 300MB Dataframe -> 600MB of structs (900 MB total in RAM)… I’d like to cut out the middle man and map my POSTS over the dataframe to try to keep the mem usage down