Hi !
I have some questions on memory usage in Prefect.
We're running our flow using Kubernetes Run with a LocalDaskExecutor.
The flow maps a bunch of tasks, with a total of about 50K tasks for it to finish.
What we see is that the memory usage of the runner pod steadily increases for all the duration of the flow. Whenever it reaches its predefined memory limit, we see a sudden drop in usage but the flow keeps running properly.
Is it something that is to be expected from a flow with that many tasks ? Is there anything we could do to prevent using as much memory ?
a
Anna Geller
10/26/2021, 12:45 PM
@Sylvain Hazard
LocalDaskExecutor
can only parallelize work within a single machine, in your use case within a single pod.
The easiest way to distribute this work would be offloading it to a distributed Dask cluster. When you use Kubernetes, you could follow this tutorial to set up a temporary Dask cluster. This would likely help with the memory usage, as Dask scheduler would manage this.
s
Sylvain Hazard
10/26/2021, 12:49 PM
I expected the LocalDaskExecutor to be able to run some kind of garbage collecting whenever tasks end or something of that nature. Guess I'm gonna have to make the switch to a proper Dask Executor !
Thanks a bunch !
👍 1
k
Kevin Kho
10/26/2021, 2:00 PM
I think you can try explicitly using the Python garbage collection to delete stuff
del something; gc.collect()
. If you are passing large objects around, you can also try saving them somewhere, passing the location, and then loading it in a downstream task.
e
Evan Curtin
10/26/2021, 8:19 PM
take a look at this
https://www.youtube.com/watch?v=nwR6iGR0mb0▾
👍 1
k
Kevin Kho
10/26/2021, 8:21 PM
Have you tried this for LocalDask @Evan Curtin? I am curious if it would work as well
e
Evan Curtin
10/26/2021, 8:22 PM
not via prefect but it helped me a lot using a dask LocalCluster
Evan Curtin
10/26/2021, 8:22 PM
I imagine that’s what is going on under the hood, no?
k
Kevin Kho
10/26/2021, 8:24 PM
That should work then I guess. It's more like a multiprocessing pool (does not use
Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.