Matthias06/10/2020, 8:01 AM
All the runs complete successfully and I have the feeling, but still the memory used does not get released. I am not really sure where to start debugging. Is there a way to force a memory release? The only other option I currently see is to force a dask-worker restart after the flow run finishes, but that feels very hacky.
distributed.worker - WARNING - Memory use is high but worker has no data to store to disk. Perhaps some other process is leaking memory? Process memory: 3.43 GB -- Worker memory limit: 4.18 GB
Jim Crist-Harif06/10/2020, 4:25 PM
def count_objects(): from pympler import muppy from collections import defaultdict from sys import getsizeof from distributed.utils import typename objs = muppy.get_objects() counts = defaultdict(int) sizes = defaultdict(int) for o in objs: k = typename(type(o)) counts[k] += 1 try: sizes[k] += getsizeof(o) except: pass return counts, sizes # Run count_objects on the worker in question output = client.run(count_objects, workers=[worker_address]) counts, sizes = output[worker_address] # you can then analyze the output to see what's taking up space. # You might use pandas to do this df = pd.DataFrame.from_dict(sizes, orient="index", columns=['sizes']) print(df.sizes.nlargest(100))
Matthias06/10/2020, 9:19 PM