https://prefect.io logo
m

Matt Denno

02/04/2021, 8:25 PM
Hello, I am having and issue that relates to serializing my results and setting checkpoint=False. I have a task that returns a Google ProtoBuff object, say like:
Copy code
@task(checkpoint=False)
def fetch() -> ProtoBufObject:
    return fetch_proto_buff_object()
In the past I have set checkpoint=False to avoid serialization errors with the ProtoBuff objects, but now, even with checkpoint=False I am getting the following when i try to register the flow.
Copy code
TypeError: can't pickle google.protobuf.pyext._message.MessageDescriptor objects
To resolve this I tried to create a serializer for the ProtoBuff like so:
Copy code
class PBSerializer(Serializer):
    def serialize(self, value: Any) -> bytes:
        # transform a Python object into bytes
        return value.SerializeToString()

    def deserialize(self, value: bytes) -> Any:
        # recover a Python object from bytes
        ts = amanzi_pb2.TimeSeries
        return ts.ParseFromString(value)
and called like:
Copy code
@task(checkpoint=False, result=LocalResult(serializer=PBSerializer()))
def fetch() -> ProtoBufObject:
    return fetch_proto_buff_object()
But I get the same error still. It doesn't seem like the result serializer is being used, but it is also not causing any errors. I have tried to debug and step through the code but can't figure out what is happening. Any help, guidance, ideas for how to resolve would be welcomed. I am on v 0.13.19
z

Zanie

02/04/2021, 8:53 PM
Hi @Matt Denno -- this error is occurring on registration or at runtime?
m

Matt Denno

02/04/2021, 8:55 PM
Hi Michael - It is happening at registration. the flow run fine if i just flow.run()
z

Zanie

02/04/2021, 8:56 PM
This doesn't seem to have anything to do with the checkpointing/results then, those would only throw errors at flow runtime. Do you have a global variable holding a
MessageDescriptor
object somewhere in your flow / task files?
m

Matt Denno

02/04/2021, 9:03 PM
Hmm, OK, that is good to know - that it is probably not really related to the checkpoint/results. I guess we had a different ProtoBuff issue in the past the caused us to need to use the checkpoint=False and I thought it might be related to the current issue. I am not sure if there is a global variable holding a message descriptor somewhere, but I will for sure look into that. Thanks.
f

Florian K. (He/Him)

02/04/2021, 11:43 PM
Hi, I am working with @Matt Denno on this project. I set up a simple test module and found that calling Flow.register() works just fine when we remove the typing from the tasks that use Google ProtoBuff arguments. I am also not 100% sure if our PB messages hold a
MessageDescriptor
object. A quick check shows me that they hold a
FileDescriptor
object which itself extends
DescriptorBase
. However, I am not delaring a global variable that holds a PB message. Now that we identified typing as the culprit I am wondering if this is a bug or the result of prefects architecture....Any ideas? Thanks,
z

Zanie

02/05/2021, 12:27 AM
Ah interesting! Thanks for reporting back. I presume when the flow is serialized the type definitions are unserializable. Can you confirm this does not occur with the types there and
flow.register(build=False)
?
f

Florian K. (He/Him)

02/05/2021, 12:31 AM
That is correct!
z

Zanie

02/05/2021, 12:33 AM
Yep so the flow metadata sent to cloud serializes just fine but cloudpickle is refusing to serialize the flow code when your storage is built since the type itself isn't serializable.
I think you'd probably run into data passing problems in a distributed setting if the task return type isn't serializable anyway. I'd return the serialized data from your task directly and deserialize it where needed.
j

Jim Crist-Harif

02/06/2021, 5:57 AM
Note that python is (slowly) transitioning to not having type definitions stored as actual objects, but instead as strings. You can opt into this with
Copy code
from __future__ import annotations
In python 3.10 this will become the default behavior: https://www.python.org/dev/peps/pep-0563/
This would likely fix your issue, since the types wouldn't be stored on the flow anymore as references. This isn't really an issue with prefect, but more with the protobuf library not playing well with pickle in that respect.
m

Matt Denno

02/08/2021, 2:42 PM
@Jim Crist-Harif Thanks so much for the tip. Will definitely give it a try.
23 Views