https://prefect.io logo
s

Sanjay Patel

02/09/2021, 12:37 AM
Hi, we are using Prefect Core and running into an issue with memory handling. We are running in a dask distributed environment. We are using the mapped function (over 400 simulations) but I have replaced this entire function with a sleep(20) i.e. not doing anything but have kept the inputs as is. We are finding that each mapped prefect task is hanging onto the inputs for the entire flow duration. The dask workers (capped at maximum of 10) slowly increase as more simulations are run on the same pod (without any output). We've minimised the problem down to passing in any input parameters (whether used or not). Where the memory allocation is manageable if no input parameters are passed (with only a sleep function within) and then problematic when we add the input parameters back in (still with on the sleep function). The Prefect Documents state that input caching occurs automatically. Is there a way to specify clearing that cache and freeing the memory after the task has successfully executed? Thanks so much!
k

Kyle Moon-Wright

02/09/2021, 1:03 AM
Hey @Sanjay Patel, Not an easy answer to this question as it is generally considered a feature to return data between tasks in-memory. I’d recommend taking a look at the recommendation in this thread on a similar issue, in which Michael details the backend implications of the situation from the Prefect side and prescribes a reduction of data passed between tasks to better allocate resources.
s

Sanjay Patel

02/09/2021, 2:24 AM
Thanks Kyle for the quick response, this is really helpful and we'll try find a workaround with something like the options mentioned in that thread