Sylvain Hazard
10/26/2021, 12:36 PMAnna Geller
LocalDaskExecutor
can only parallelize work within a single machine, in your use case within a single pod.
The easiest way to distribute this work would be offloading it to a distributed Dask cluster. When you use Kubernetes, you could follow this tutorial to set up a temporary Dask cluster. This would likely help with the memory usage, as Dask scheduler would manage this.Sylvain Hazard
10/26/2021, 12:49 PMKevin Kho
del something; gc.collect()
. If you are passing large objects around, you can also try saving them somewhere, passing the location, and then loading it in a downstream task.Kevin Kho
Evan Curtin
10/26/2021, 8:22 PMEvan Curtin
10/26/2021, 8:22 PMKevin Kho
distributed
)