Hey <@ULVA73B9P>. Prefect 3 caches the task input ...
# ask-marvin
f
Hey @Marvin. Prefect 3 caches the task input parameters. I have a huge class I’m passing to every task, is there a way to avoid that class to be cached, globally ? by adding something in the class itself ?
m
In Prefect 3, task caching is determined primarily by the inputs provided to the task, the code definition of the task, and, depending on the context, the flow or task run ID. This means that by default, any parameters you pass to a task may be used to calculate the cache key for that task's results. If you want to avoid caching for a specific class you pass to tasks, you might consider altering how that class is passed or how tasks are designed. Unfortunately, Prefect doesn't provide a built-in mechanism to mark specific classes to avoid caching directly. However, you can customize cache behavior by altering how task inputs are handled or persisted. For detailed guidance on configuring caching, you can check out the Prefect 3 documentation on task caching here. This section might offer insights into how you can adjust caching to suit your specific needs. Sometimes, the only way to escape the maze of caching is to redefine how you think about task inputs. Or just embrace it, like an existential dread that's really good at optimizing your workflows.
f
have a solution, but it’s a bit hacky and I would like to get someone from Prefect give me an opinion. @Nate maybe ? Thread wrap-up : • It’s about task caching. • Tasks caches parameters by default. • Now I have a Snowflake class instance that is passed everywhere, and it is not serializable by the cache policies (rightfully so) • I love marvin’s answer, “embrace the change”, I really do, but I can’t. years of devs, multiple teams, lots of flows So here’s my solution: • I found in the code, in
utilies.pydantic
, that if the object inherit from
BaseModel
, then Prefect simply calls
model_dump
• So I’ve made my Snowflake class extending BaseModel, implements model_dump • Problem solved Do we think this is a viable solution ?
n
hi @Florian Giroud! you can do this, but I will mention that this is a good reason to customize serialization using the first class pydantic ways
Copy code
from pydantic import BaseModel, ConfigDict, model_serializer

from prefect import flow, task
from prefect.cache_policies import INPUTS


class NotSerializable:
    """A class that refuses to be serialized"""

    def __getstate__(self):
        raise TypeError("I refuse to be serialized!")


class Wrapper(BaseModel):
    """Example of controlling what gets cached for a non-serializable object"""

    model_config = ConfigDict(arbitrary_types_allowed=True)

    name: str
    bad_object: NotSerializable

    @model_serializer
    def ser_model(self) -> dict:
        """Only serialize the name, not the problematic object"""
        return {"name": self.name}


@task(cache_policy=INPUTS)
def use_object(wrapper: Wrapper) -> str:
    return f"Used {wrapper.name}"


@flow
def demo_flow():
    wrapper = Wrapper(name="test", bad_object=NotSerializable())
    return use_object(wrapper)


if __name__ == "__main__":
    print(demo_flow())
note that if you comment out this
Copy code
@model_serializer
    def ser_model(self) -> dict:
        """Only serialize the name, not the problematic object"""
        return {"name": self.name}
it will fail like
Copy code
ValueError: Unable to create hash - objects could not be serialized.
  JSON error: Unable to serialize unknown type: <class '__main__.NotSerializable'>
  Pickle error: I refuse to be serialized!

This often occurs when task inputs contain objects that cannot be cached like locks, file handles, or other system resources.

To resolve this, you can:
  1. Exclude these arguments by defining a custom `cache_key_fn`
  2. Disable caching by passing `cache_policy=NONE`
f
Oh tx, indeed a more proper way
n