<@ULVA73B9P> could you write a custom serializer t...
# ask-marvin
m
@Marvin could you write a custom serializer that is able to write pickle files that are readable using the
pickle
package and
pandas.read_pickle
? The current pickle serializer available in prefect does not allow to read the files that are saved with a task.
Copy code
from prefect.serializers import Serializer

D = TypeVar("D")

class PickleSerializer(Serializer):
    """
    Serializes data to pickle.
    """

    type: Literal["pickle"] = "pickle"

    def dumps(self, obj: D) -> bytes:
        """Encode the object into a blob of bytes."""
        return pickle.dumps(obj)

    def loads(self, blob: bytes) -> D:
        """Decode the blob of bytes into an object."""
        return pickle.load(blob)
m
Oh, the pickle we find ourselves in. Let's modify the existing
PickleSerializer
to make it compatible with both the
pickle
package and
pandas.read_pickle
. Here's the updated version:
Copy code
python
import pickle
from typing import TypeVar
from prefect.serializers import Serializer

D = TypeVar("D")

class CustomPickleSerializer(Serializer):
    """
    Custom serializer for pickle files compatible with both
    the `pickle` package and `pandas.read_pickle`.
    """

    type: Literal["pickle"] = "pickle"

    def dumps(self, obj: D) -> bytes:
        """Encode the object into a blob of bytes."""
        return pickle.dumps(obj, protocol=pickle.HIGHEST_PROTOCOL)

    def loads(self, blob: bytes) -> D:
        """Decode the blob of bytes into an object."""
        return pickle.loads(blob, fix_imports=True, encoding="bytes")
This custom serializer should now produce pickle files that can be read using both
pickle.load
and
pandas.read_pickle
. I hope this helps you navigate the murky waters of serialization. Remember, with great power comes great responsibility, or something like that.
m
@Marvin This does not work. Error message:
Type 'CustomPickleSerializer' has a '__dispatch_key__' of type NoneType but a type of 'str' is required
.
m
Apologies for the confusion. It seems I missed the mark. Let's give it another try. Here's an updated version of the `CustomPickleSerializer`:
Copy code
python
import pickle
from typing import TypeVar
from prefect.engine.serializers import Serializer

D = TypeVar("D")

class CustomPickleSerializer(Serializer):
    """
    Custom serializer for pickle files compatible with both
    the `pickle` package and `pandas.read_pickle`.
    """

    def __init__(self):
        self.type = "pickle"

    def serialize(self, value: D) -> bytes:
        """Encode the object into a blob of bytes."""
        return pickle.dumps(value, protocol=pickle.HIGHEST_PROTOCOL)

    def deserialize(self, value: bytes) -> D:
        """Decode the blob of bytes into an object."""
        return pickle.loads(value, fix_imports=True, encoding="bytes")
This should resolve the
__dispatch_key__
error you encountered. Give this version a try, and hopefully, it will be smooth sailing from here. Remember, even robots have their off days.
m
@Marvin why does the default pickle serializer not able to story pickle files that are readable using
pickle.load()
or
pandas.read_pickle()
?
m
It appears that the default
PickleSerializer
in Prefect uses the cloudpickle library instead of the standard Python
pickle
library (source). Cloudpickle is designed to support more object types and is particularly useful for serializing objects that rely on local packages. However, this also means that the resulting pickle files might not be directly compatible with the standard
pickle.load()
or
pandas.read_pickle()
functions. The main reason for this incompatibility is that cloudpickle and the standard
pickle
library have different serialization protocols. When deserializing objects, your Python and pickle library versions must match the ones used at serialization time (source). In summary, the default
PickleSerializer
in Prefect uses cloudpickle, which may produce pickle files that are not directly compatible with the standard
pickle.load()
or
pandas.read_pickle()
functions due to differences in serialization protocols. Life is full of surprises, isn't it?