Hi Everyone, I have been facing an issue with `S3R...
# prefect-community
Hi Everyone, I have been facing an issue with
- mainly once I use
- my memory usage is doubled - so I took the time to look through the source code to test the memory usage locally and I believe there is an issue in the implementation - more specifically in the following line in
Copy code
binary_data = new.serialize_to_bytes(new.value)
a new object is being created here - requiring twice the memory allocation - at least this is what seems to me to be happening - please correct me if I am wrong here See full
method below
Copy code
def write(self, value: Any, **kwargs: Any) -> Result:
        Writes the result to a location in S3 and returns the resulting URI.

            - value (Any): the value to write; will then be stored as the `value` attribute
                of the returned `Result` instance
            - **kwargs (optional): if provided, will be used to format the location template
                to determine the location to write to

            - Result: a new Result instance with the appropriately formatted S3 URI

        new = self.format(**kwargs)
        new.value = value
        self.logger.debug("Starting to upload result to {}...".format(new.location))
        binary_data = new.serialize_to_bytes(new.value)

        stream = io.BytesIO(binary_data)

        ## upload
        from botocore.exceptions import ClientError

            self.client.upload_fileobj(stream, Bucket=self.bucket, Key=new.location)
        except ClientError as err:
            self.logger.error("Error uploading to S3: {}".format(err))
            raise err

        self.logger.debug("Finished uploading result to {}.".format(new.location))
        return new
Hi @Marwan Sarieddine - yes you are correct; Prefect must convert the task return value into something that can be stored. Because Prefect imposes very little restriction on the types of data that can be passed around, converting the object to bytes is the most universal approach
Hi @Chris White - thank you for the quick response - so I guess the solution here would be for me to implement by own custom Result class - if I am so wary of memory usage?
yea that’s one option; we are also starting to explore exposing the serialization protocol to users directly: https://github.com/PrefectHQ/prefect/issues/2639 I’m curious, for your use case, how do you plan to get around making a copy?
haven’t explored “elegant” solutions yet - my hunch probably tells me that I would have to delete objects after “using” them - will get back to you on how I end up implementing this
gotcha yea I’d be curious! If there are any easy wins that we could implement that would help manage memory better I’m open to it
Exposing the serialization would give users great power, that’s cool.
@Chris White perhaps if
returns a Byte stream it would not have to allocate the excessive memory?