Marco Barbero Mota
05/17/2023, 6:16 PMpickle
package and pandas.read_pickle
? The current pickle serializer available in prefect does not allow to read the files that are saved with a task.
from prefect.serializers import Serializer
D = TypeVar("D")
class PickleSerializer(Serializer):
"""
Serializes data to pickle.
"""
type: Literal["pickle"] = "pickle"
def dumps(self, obj: D) -> bytes:
"""Encode the object into a blob of bytes."""
return pickle.dumps(obj)
def loads(self, blob: bytes) -> D:
"""Decode the blob of bytes into an object."""
return pickle.load(blob)
Marvin
05/17/2023, 6:18 PMPickleSerializer
to make it compatible with both the pickle
package and pandas.read_pickle
. Here's the updated version:
python
import pickle
from typing import TypeVar
from prefect.serializers import Serializer
D = TypeVar("D")
class CustomPickleSerializer(Serializer):
"""
Custom serializer for pickle files compatible with both
the `pickle` package and `pandas.read_pickle`.
"""
type: Literal["pickle"] = "pickle"
def dumps(self, obj: D) -> bytes:
"""Encode the object into a blob of bytes."""
return pickle.dumps(obj, protocol=pickle.HIGHEST_PROTOCOL)
def loads(self, blob: bytes) -> D:
"""Decode the blob of bytes into an object."""
return pickle.loads(blob, fix_imports=True, encoding="bytes")
This custom serializer should now produce pickle files that can be read using both pickle.load
and pandas.read_pickle
. I hope this helps you navigate the murky waters of serialization. Remember, with great power comes great responsibility, or something like that.Marco Barbero Mota
05/17/2023, 6:40 PMType 'CustomPickleSerializer' has a '__dispatch_key__' of type NoneType but a type of 'str' is required
.Marvin
05/17/2023, 10:18 PMpython
import pickle
from typing import TypeVar
from prefect.engine.serializers import Serializer
D = TypeVar("D")
class CustomPickleSerializer(Serializer):
"""
Custom serializer for pickle files compatible with both
the `pickle` package and `pandas.read_pickle`.
"""
def __init__(self):
self.type = "pickle"
def serialize(self, value: D) -> bytes:
"""Encode the object into a blob of bytes."""
return pickle.dumps(value, protocol=pickle.HIGHEST_PROTOCOL)
def deserialize(self, value: bytes) -> D:
"""Decode the blob of bytes into an object."""
return pickle.loads(value, fix_imports=True, encoding="bytes")
This should resolve the __dispatch_key__
error you encountered. Give this version a try, and hopefully, it will be smooth sailing from here. Remember, even robots have their off days.Marco Barbero Mota
05/17/2023, 10:18 PMpickle.load()
or pandas.read_pickle()
?Marvin
05/17/2023, 10:21 PMPickleSerializer
in Prefect uses the cloudpickle library instead of the standard Python pickle
library (source). Cloudpickle is designed to support more object types and is particularly useful for serializing objects that rely on local packages. However, this also means that the resulting pickle files might not be directly compatible with the standard pickle.load()
or pandas.read_pickle()
functions.
The main reason for this incompatibility is that cloudpickle and the standard pickle
library have different serialization protocols. When deserializing objects, your Python and pickle library versions must match the ones used at serialization time (source).
In summary, the default PickleSerializer
in Prefect uses cloudpickle, which may produce pickle files that are not directly compatible with the standard pickle.load()
or pandas.read_pickle()
functions due to differences in serialization protocols. Life is full of surprises, isn't it?Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.
Powered by