Hi Everyone, I have been facing an issue with `S3R...
# prefect-community
m
Hi Everyone, I have been facing an issue with
S3Result
- mainly once I use
S3Result
- my memory usage is doubled - so I took the time to look through the source code to test the memory usage locally and I believe there is an issue in the implementation - more specifically in the following line in
s3_result.py
Copy code
binary_data = new.serialize_to_bytes(new.value)
a new object is being created here - requiring twice the memory allocation - at least this is what seems to me to be happening - please correct me if I am wrong here See full
write
method below
Copy code
def write(self, value: Any, **kwargs: Any) -> Result:
        """
        Writes the result to a location in S3 and returns the resulting URI.

        Args:
            - value (Any): the value to write; will then be stored as the `value` attribute
                of the returned `Result` instance
            - **kwargs (optional): if provided, will be used to format the location template
                to determine the location to write to

        Returns:
            - Result: a new Result instance with the appropriately formatted S3 URI
        """

        new = self.format(**kwargs)
        new.value = value
        self.logger.debug("Starting to upload result to {}...".format(new.location))
        binary_data = new.serialize_to_bytes(new.value)

        stream = io.BytesIO(binary_data)

        ## upload
        from botocore.exceptions import ClientError

        try:
            self.client.upload_fileobj(stream, Bucket=self.bucket, Key=new.location)
        except ClientError as err:
            self.logger.error("Error uploading to S3: {}".format(err))
            raise err

        self.logger.debug("Finished uploading result to {}.".format(new.location))
        return new
c
Hi @Marwan Sarieddine - yes you are correct; Prefect must convert the task return value into something that can be stored. Because Prefect imposes very little restriction on the types of data that can be passed around, converting the object to bytes is the most universal approach
m
Hi @Chris White - thank you for the quick response - so I guess the solution here would be for me to implement by own custom Result class - if I am so wary of memory usage?
c
yea that’s one option; we are also starting to explore exposing the serialization protocol to users directly: https://github.com/PrefectHQ/prefect/issues/2639 I’m curious, for your use case, how do you plan to get around making a copy?
m
haven’t explored “elegant” solutions yet - my hunch probably tells me that I would have to delete objects after “using” them - will get back to you on how I end up implementing this
c
gotcha yea I’d be curious! If there are any easy wins that we could implement that would help manage memory better I’m open to it
a
Exposing the serialization would give users great power, that’s cool.
@Chris White perhaps if
serialize_to_bytes
returns a Byte stream it would not have to allocate the excessive memory?