Sylvain Hazard10/26/2021, 12:36 PM
Anna Geller10/26/2021, 12:45 PM
can only parallelize work within a single machine, in your use case within a single pod. The easiest way to distribute this work would be offloading it to a distributed Dask cluster. When you use Kubernetes, you could follow this tutorial to set up a temporary Dask cluster. This would likely help with the memory usage, as Dask scheduler would manage this.
Sylvain Hazard10/26/2021, 12:49 PM
Kevin Kho10/26/2021, 2:00 PM
. If you are passing large objects around, you can also try saving them somewhere, passing the location, and then loading it in a downstream task.
del something; gc.collect()
Kevin Kho10/26/2021, 8:21 PM
Evan Curtin10/26/2021, 8:22 PM
Kevin Kho10/26/2021, 8:24 PM