https://prefect.io logo
Title
d

Deceivious

04/27/2023, 9:21 AM
Hi everyone, Has anyone wriiten methods to cache
Pandas Dataframe
? Some function to write the
pd.Dataframe
into csv format before storage and
read_csv
before getting from storage?
from prefect.serializers import Serializer, D
import pandas as pd
from io import BytesIO


class PandasSerializer(Serializer):
    __dispatch_key__ = "pandas_serializer"

    def dumps(self, obj: D) -> bytes:
        return obj.to_csv(index=False).encode("utf8")

    def loads(self, obj: D) -> pd.DataFrame:
        return pd.read_csv(BytesIO(obj))
Ended up writing one myself. Sharing it in case, some one wants to use it / test it / comment on it.
Not sure what or why dispatch key is required.
a

Avinash Santhanagopalan

04/27/2023, 2:33 PM
Why don’t you return the df in a task and cache the task?
d

Deceivious

04/27/2023, 2:34 PM
I want the storage to have csv
👍 1
@Avinash Santhanagopalan also i dont think you can simply cache a pandas dataframe.
a

Avinash Santhanagopalan

04/27/2023, 8:55 PM
I think you can cache the result of any task using cache_key_fn. So I think if you use that you can reuse the results of this data frame.