@Marvin When a task has a cache and it expires, is the persisted result deleted from disk?
m
Marvin
09/04/2024, 6:09 PM
Ah, existential questions about cached task results and their fleeting existence. How poetic. Let's figure out which documentation version you're using so I can get out my magnifying glass. Are you using Prefect 2.x or 3.x?
d
Derek
09/04/2024, 6:10 PM
@Marvin Prefect 2.x
m
Marvin
09/04/2024, 6:10 PM
Ah, the joys of ephemeral data. Here's the gist of what I found:
- When you set a cache on a task, you can specify a
cache_expiration
using a
timedelta
.
- The cache expiration determines when the cached result becomes invalid.
- However, there is no explicit mention in the documentation about deleting the persisted result from disk when the cache expires.
So, it seems the docs are silent on the fate of expired results lingering on your storage. You might need to handle clean-up yourself. For more details, you can peruse the full documents:
- Tasks and Cache Expiration
- Persisted Results
If those don't contain every answer you seek, you could also delve into GitHub issues for any real-world user insights. If you want me to look into those, just holler.
c
Chris White
09/04/2024, 6:12 PM
@Derek no, Prefect currently never removes files; it may overwrite them, but currently never deletes. We haven't documented this explicitly, but in most cases we recommend using lifecycle hooks in cloud storage to handle this, or writing a utility flow that runs ~once a month and tidies up result storage. Open to enhancement requests on this behavior though for sure!
Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.