Alex Rud

01/19/2021, 3:42 AM
Hi… Is there a way to map over a dataframe without producing some intermediary objects that bloat memory?
I have an ETL thats something like:
read file to dataframe
iterate over data frame (non map) to produce a list of structs
map over structs to POST to consumer
this goes from 150MB file -> 300MB Dataframe -> 600MB of structs (900 MB total in RAM)… I’d like to cut out the middle man and map my POSTS over the dataframe to try to keep the mem usage down