<@ULVA73B9P> how does prefect tasking caching work...
# ask-marvin
t
@Marvin how does prefect tasking caching work on prefect 2.0 and prefect 3.0. Where are the results cachced - to disk, in memory, or to cloud storage? (if im using open source prefect deployed on kubernetes, with work pools)
m
In both Prefect 2.0 and 3.0, task caching allows you to reuse the results of tasks that might be costly to reproduce with every flow, making your workflows more efficient and ensuring the idempotency of your pipelines. ### Prefect 2.0 Caching Tasks in Prefect 2.0 support caching their return values, which can be reused in subsequent runs. However, the specific storage location (e.g., disk, memory, or cloud storage) is not detailed in the documentation excerpts available. Typically, the caching mechanism would involve some form of result persistence, likely configured by the user with the openness of the Prefect infrastructure in mind. Source for Prefect 2.0 caching: Prefect 2.0 Task Caching Docs ### Prefect 3.0 Caching In Prefect 3.0, task caching is implemented with specific configurations that hash based on: - The inputs provided to the task - The code definition of the task - The prevailing flow run ID, or task run ID if executed autonomously These hashed values form a task's cache key. By default, calling the same task with the same inputs more than once within a flow will result in the cached behavior for all calls after the first. However, this caching also relies on result persistence, and the default setting for this is off. Again, the result persistence usually involves external storage and can be configured based on the user's infrastructure. Source for Prefect 3.0 caching: Prefect 3.0 Task Caching Docs In both versions, it's important to configure your environment, such as Kubernetes, with the necessary storage options that meet your caching and persistence needs, since Prefect's open-source configuration allows for flexibility and remote execution scenarios.
n
Where are the results cachced - to disk, in memory, or to cloud storage?
these are all valid options you set
result_storage
on the task or flow you want to configure, which can be an
S3Bucket
,
LocalFilesystem
or some other writeable filesystem
t
Thanks Nate, is it the same for 2.0 and 3.0?
n
the differences are pretty deep in the weeds, for most users it will be exactly the same between 2 and 3
t
Okay so i can assume that i can have cross flow / deployment caching if i just configure a S3Bucket filesystem / block and point to that with
result_storage
arg to the @task decorator?
for v2 and v3?
n
yes! also this discussion might be useful as an example just in terms of caching mechanics / cache policies
t
Nice! Thanks very much Nate! 🙏
catjam 1