Zviri
05/01/2020, 12:09 PMflow.run
) the memory consumption of my Dask Workers gradually grows until they run out of memory. When I run the same flow from a script everything finishes correctly and there are no memory issues what so ever.
I am using the latest prefect
and dask
packages. Has anyone experienced something similar? Is there anything that could be done about it? Thanks a lot.Jenny
05/01/2020, 1:13 PMZviri
05/01/2020, 3:13 PMJim Crist-Harif
05/01/2020, 3:18 PMZviri
05/01/2020, 3:40 PMJim Crist-Harif
05/01/2020, 3:43 PMZviri
05/03/2020, 6:31 PMheapy
.
I found out that what is clogging up the memory of my workers are these strings (there are tons of them):
<p>The error code is a string that uniquely identifies an error condition. It is meant to be read and understood by programs that detect and handle errors by type. </p>
<p class="title">
<b>Amazon S3 error codes</b>
</p>
<ul>
<li>
<ul>
<li>
<p>
<i>Code:</i> AccessDenied </p>
</li>
<li>
<p>
<i>Description:</i> Access Denied</p>
</li>
<li>
<p>
<i>HTTP Status Code:</i> 403 Forbidden</p>
</li>
<li>
<p>
<i>SOAP Fault Code Prefix:</i> Client</p>
</li>
</ul>
</li>
...
So at first, I assumed a library of mine used for accessing s3 is doing this, so I stripped down my flow to only 2 steps:
1. fetch a list of URLs from db
2. download them using requests but do not store the results anywhere
So now there was no code of mine doing anything with s3.
But the issue was still occurring. So I figured the only remaining culprit could be the S3 storage that I used for my flows.
To test this hypothesis I replaced the S3 storage with Local storage. As expected the issue disappeared after this.
I have no idea what the root cause of this is so I will continue digging. But it seems to be connected to boto3 and to originate in the S3 storage.--nprocs
with the workers so it might have to do something with the forking logic.Jim Crist-Harif
05/05/2020, 12:58 PMZviri
05/05/2020, 2:32 PMJim Crist-Harif
05/07/2020, 5:51 PM