Hi everyone, Has anyone wriiten methods to cache `...
# ask-community
d
Hi everyone, Has anyone wriiten methods to cache
Pandas Dataframe
? Some function to write the
pd.Dataframe
into csv format before storage and
read_csv
before getting from storage?
Copy code
from prefect.serializers import Serializer, D
import pandas as pd
from io import BytesIO


class PandasSerializer(Serializer):
    __dispatch_key__ = "pandas_serializer"

    def dumps(self, obj: D) -> bytes:
        return obj.to_csv(index=False).encode("utf8")

    def loads(self, obj: D) -> pd.DataFrame:
        return pd.read_csv(BytesIO(obj))
Ended up writing one myself. Sharing it in case, some one wants to use it / test it / comment on it.
Not sure what or why dispatch key is required.
a
Why don’t you return the df in a task and cache the task?
d
I want the storage to have csv
👍 1
@Avinash Santhanagopalan also i dont think you can simply cache a pandas dataframe.
a
I think you can cache the result of any task using cache_key_fn. So I think if you use that you can reuse the results of this data frame.