Colin Bieberstein

11/17/2022, 3:29 AM
Hi everyone. I see lots of examples where people are using Pandas / SQL Alcehemy to do an extract / Load operation. What I haven’t seen are examples of how to handle data sets larger than memory for these operations. Do you advocate to running pyspark, Dask clusters or is there a mechanism to do something with ACI / ECS fargate so that a just in time just big enough worker can be launched?

Anna Geller

11/17/2022, 1:15 PM
hard to give recommendations without knowing the source format of the data and the destination when you want to use pandas, loading in chunks might be a good option