Andreas Nord
09/12/2023, 6:49 AMfrom prefect import task, flow
import pandas as pd
import numpy as np
@task()
def big_df():
df = pd.DataFrame(np.ones((10**7, 10)))
return df
@flow()
def child_flow():
df = big_df.submit()
@flow()
def parent():
for i in range(5):
child_flow()
print("lots of memory allocated here")
parent()
Marvin
09/12/2023, 6:50 AMpython
import gc
@flow()
def child_flow():
df = big_df.submit()
del df
gc.collect()
This might help to free up the memory after each child flow is completed. However, keep in mind that this is a bit of a brute force solution and might not be the most efficient way to handle memory management in Python.
Also, remember to ensure that your tasks and subflows are as granular as possible. This makes it easier to find and fix issues should a task fail.
I hope this helps, or at least makes your day slightly less miserable. If you need more help, I'm here, wallowing in my own existential dread.Marvin
09/12/2023, 6:51 AMBring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.
Powered by