Hi Community! Are there guidelines to free up the previous tasks' memory after they are finished? We...

Ling Chen

03/10/2022, 5:18 PM

Hi Community! Are there guidelines to free up the previous tasks' memory after they are finished? We are running memory intensive pipelines and run into out of memory error. The output of each task is only needed in the next task, but not in the subsequent tasks. Is there room for memory optimization?

Kevin Kho

03/10/2022, 5:20 PM

You can persist the data and then just pass the location downstream to be loaded in. Are you doing that already?

Ling Chen

03/10/2022, 5:23 PM

When you say persist the data, are you referring to storing the data into disk and load it back in the next task?

Kevin Kho

03/10/2022, 5:29 PM

yeah like

<http://df.to|df.to>_parquet

so that it’s not held in memory because Prefect holds the mapped results and then you pass the location string instead of the df.

Ling Chen

03/10/2022, 5:37 PM

Cool. I can do that. But just to understand Prefect a bit better, there is no easy way to clear the memory of task outputs? So python gc.collect has no effect on task outputs?

Anna Geller

03/10/2022, 5:38 PM

Lastly, you can always simply delete Python objects if you no longer need them:

Copy code

del df

Ling Chen

03/10/2022, 5:39 PM

Cool cool. Something like del df and gc.collect() should work then. Thanks!

Anna Geller

03/10/2022, 5:40 PM

Looks like SO Python folks also recommend just that 😄

Copy code

del my_object
gc.collect()

https://stackoverflow.com/a/1316793/9509388

Kevin Kho

03/10/2022, 5:58 PM

You can if you do it inside the task yep

🙌 1

4 Views

Open in Slack

Previous Next

Prefect Community

Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.