Marwan Sarieddine

05/27/2020, 12:14 AM
Hi Everyone, I have been facing an issue with
- mainly once I use
- my memory usage is doubled - so I took the time to look through the source code to test the memory usage locally and I believe there is an issue in the implementation - more specifically in the following line in
binary_data = new.serialize_to_bytes(new.value)
a new object is being created here - requiring twice the memory allocation - at least this is what seems to me to be happening - please correct me if I am wrong here See full
method below
def write(self, value: Any, **kwargs: Any) -> Result:
        Writes the result to a location in S3 and returns the resulting URI.

            - value (Any): the value to write; will then be stored as the `value` attribute
                of the returned `Result` instance
            - **kwargs (optional): if provided, will be used to format the location template
                to determine the location to write to

            - Result: a new Result instance with the appropriately formatted S3 URI

        new = self.format(**kwargs)
        new.value = value
        self.logger.debug("Starting to upload result to {}...".format(new.location))
        binary_data = new.serialize_to_bytes(new.value)

        stream = io.BytesIO(binary_data)

        ## upload
        from botocore.exceptions import ClientError

            self.client.upload_fileobj(stream, Bucket=self.bucket, Key=new.location)
        except ClientError as err:
            self.logger.error("Error uploading to S3: {}".format(err))
            raise err

        self.logger.debug("Finished uploading result to {}.".format(new.location))
        return new

Chris White

05/27/2020, 12:25 AM
Hi @Marwan Sarieddine - yes you are correct; Prefect must convert the task return value into something that can be stored. Because Prefect imposes very little restriction on the types of data that can be passed around, converting the object to bytes is the most universal approach

Marwan Sarieddine

05/27/2020, 12:26 AM
Hi @Chris White - thank you for the quick response - so I guess the solution here would be for me to implement by own custom Result class - if I am so wary of memory usage?

Chris White

05/27/2020, 12:30 AM
yea that’s one option; we are also starting to explore exposing the serialization protocol to users directly: I’m curious, for your use case, how do you plan to get around making a copy?

Marwan Sarieddine

05/27/2020, 12:33 AM
haven’t explored “elegant” solutions yet - my hunch probably tells me that I would have to delete objects after “using” them - will get back to you on how I end up implementing this

Chris White

05/27/2020, 12:34 AM
gotcha yea I’d be curious! If there are any easy wins that we could implement that would help manage memory better I’m open to it

Avi A

05/27/2020, 7:19 AM
Exposing the serialization would give users great power, that’s cool.
@Chris White perhaps if
returns a Byte stream it would not have to allocate the excessive memory?