m

    Matt Denno

    1 year ago
    Hello, I am having and issue that relates to serializing my results and setting checkpoint=False. I have a task that returns a Google ProtoBuff object, say like:
    @task(checkpoint=False)
    def fetch() -> ProtoBufObject:
        return fetch_proto_buff_object()
    In the past I have set checkpoint=False to avoid serialization errors with the ProtoBuff objects, but now, even with checkpoint=False I am getting the following when i try to register the flow.
    TypeError: can't pickle google.protobuf.pyext._message.MessageDescriptor objects
    To resolve this I tried to create a serializer for the ProtoBuff like so:
    class PBSerializer(Serializer):
        def serialize(self, value: Any) -> bytes:
            # transform a Python object into bytes
            return value.SerializeToString()
    
        def deserialize(self, value: bytes) -> Any:
            # recover a Python object from bytes
            ts = amanzi_pb2.TimeSeries
            return ts.ParseFromString(value)
    and called like:
    @task(checkpoint=False, result=LocalResult(serializer=PBSerializer()))
    def fetch() -> ProtoBufObject:
        return fetch_proto_buff_object()
    But I get the same error still. It doesn't seem like the result serializer is being used, but it is also not causing any errors. I have tried to debug and step through the code but can't figure out what is happening. Any help, guidance, ideas for how to resolve would be welcomed. I am on v 0.13.19
    Michael Adkins

    Michael Adkins

    1 year ago
    Hi @Matt Denno -- this error is occurring on registration or at runtime?
    m

    Matt Denno

    1 year ago
    Hi Michael - It is happening at registration. the flow run fine if i just flow.run()
    Michael Adkins

    Michael Adkins

    1 year ago
    This doesn't seem to have anything to do with the checkpointing/results then, those would only throw errors at flow runtime. Do you have a global variable holding a
    MessageDescriptor
    object somewhere in your flow / task files?
    m

    Matt Denno

    1 year ago
    Hmm, OK, that is good to know - that it is probably not really related to the checkpoint/results. I guess we had a different ProtoBuff issue in the past the caused us to need to use the checkpoint=False and I thought it might be related to the current issue. I am not sure if there is a global variable holding a message descriptor somewhere, but I will for sure look into that. Thanks.
    Florian K. (He/Him)

    Florian K. (He/Him)

    1 year ago
    Hi, I am working with @Matt Denno on this project. I set up a simple test module and found that calling Flow.register() works just fine when we remove the typing from the tasks that use Google ProtoBuff arguments. I am also not 100% sure if our PB messages hold a
    MessageDescriptor
    object. A quick check shows me that they hold a
    FileDescriptor
    object which itself extends
    DescriptorBase
    . However, I am not delaring a global variable that holds a PB message. Now that we identified typing as the culprit I am wondering if this is a bug or the result of prefects architecture....Any ideas? Thanks,
    Michael Adkins

    Michael Adkins

    1 year ago
    Ah interesting! Thanks for reporting back. I presume when the flow is serialized the type definitions are unserializable. Can you confirm this does not occur with the types there and
    flow.register(build=False)
    ?
    Florian K. (He/Him)

    Florian K. (He/Him)

    1 year ago
    That is correct!
    Michael Adkins

    Michael Adkins

    1 year ago
    Yep so the flow metadata sent to cloud serializes just fine but cloudpickle is refusing to serialize the flow code when your storage is built since the type itself isn't serializable.
    I think you'd probably run into data passing problems in a distributed setting if the task return type isn't serializable anyway. I'd return the serialized data from your task directly and deserialize it where needed.
    Jim Crist-Harif

    Jim Crist-Harif

    1 year ago
    Note that python is (slowly) transitioning to not having type definitions stored as actual objects, but instead as strings. You can opt into this with
    from __future__ import annotations
    In python 3.10 this will become the default behavior: https://www.python.org/dev/peps/pep-0563/
    This would likely fix your issue, since the types wouldn't be stored on the flow anymore as references. This isn't really an issue with prefect, but more with the protobuf library not playing well with pickle in that respect.
    m

    Matt Denno

    1 year ago
    @Jim Crist-Harif Thanks so much for the tip. Will definitely give it a try.