Matthias
06/10/2020, 8:01 AMdistributed.worker - WARNING - Memory use is high but worker has no data to store to disk. Perhaps some other process is leaking memory? Process memory: 3.43 GB -- Worker memory limit: 4.18 GB
All the runs complete successfully and I have the feeling, but still the memory used does not get released. I am not really sure where to start debugging. Is there a way to force a memory release? The only other option I currently see is to force a dask-worker restart after the flow run finishes, but that feels very hacky.Jim Crist-Harif
06/10/2020, 4:25 PMdef count_objects():
from pympler import muppy
from collections import defaultdict
from sys import getsizeof
from distributed.utils import typename
objs = muppy.get_objects()
counts = defaultdict(int)
sizes = defaultdict(int)
for o in objs:
k = typename(type(o))
counts[k] += 1
try:
sizes[k] += getsizeof(o)
except:
pass
return counts, sizes
# Run count_objects on the worker in question
output = client.run(count_objects, workers=[worker_address])
counts, sizes = output[worker_address]
# you can then analyze the output to see what's taking up space.
# You might use pandas to do this
df = pd.DataFrame.from_dict(sizes, orient="index", columns=['sizes'])
print(df.sizes.nlargest(100))
Matthias
06/10/2020, 9:19 PM