Marwan Sarieddine
05/27/2020, 12:14 AMS3Result
- mainly once I use S3Result
- my memory usage is doubled - so I took the time to look through the source code to test the memory usage locally and I believe there is an issue in the implementation - more specifically in the following line in s3_result.py
binary_data = new.serialize_to_bytes(new.value)
a new object is being created here - requiring twice the memory allocation - at least this is what seems to me to be happening - please correct me if I am wrong here
See full write
method below
def write(self, value: Any, **kwargs: Any) -> Result:
"""
Writes the result to a location in S3 and returns the resulting URI.
Args:
- value (Any): the value to write; will then be stored as the `value` attribute
of the returned `Result` instance
- **kwargs (optional): if provided, will be used to format the location template
to determine the location to write to
Returns:
- Result: a new Result instance with the appropriately formatted S3 URI
"""
new = self.format(**kwargs)
new.value = value
self.logger.debug("Starting to upload result to {}...".format(new.location))
binary_data = new.serialize_to_bytes(new.value)
stream = io.BytesIO(binary_data)
## upload
from botocore.exceptions import ClientError
try:
self.client.upload_fileobj(stream, Bucket=self.bucket, Key=new.location)
except ClientError as err:
self.logger.error("Error uploading to S3: {}".format(err))
raise err
self.logger.debug("Finished uploading result to {}.".format(new.location))
return new
Chris White
05/27/2020, 12:25 AMMarwan Sarieddine
05/27/2020, 12:26 AMChris White
05/27/2020, 12:30 AMMarwan Sarieddine
05/27/2020, 12:33 AMChris White
05/27/2020, 12:34 AMAvi A
05/27/2020, 7:19 AMserialize_to_bytes
returns a Byte stream it would not have to allocate the excessive memory?