Sylvain Hazard
10/26/2021, 12:36 PMAnna Geller
10/26/2021, 12:45 PMLocalDaskExecutor
can only parallelize work within a single machine, in your use case within a single pod.
The easiest way to distribute this work would be offloading it to a distributed Dask cluster. When you use Kubernetes, you could follow this tutorial to set up a temporary Dask cluster. This would likely help with the memory usage, as Dask scheduler would manage this.Sylvain Hazard
10/26/2021, 12:49 PMKevin Kho
10/26/2021, 2:00 PMdel something; gc.collect()
. If you are passing large objects around, you can also try saving them somewhere, passing the location, and then loading it in a downstream task.Kevin Kho
10/26/2021, 8:21 PMEvan Curtin
10/26/2021, 8:22 PMKevin Kho
10/26/2021, 8:24 PMdistributed
)