Hi. How the default `checkpointing` works in prefe...
# prefect-community
v
Hi. How the default
checkpointing
works in prefect 1.0.0? In documentation I see that it is enabled by default, but what is the default
Result
?
k
The default is a LocalResult that is Cloudpickled
v
@Kevin Kho Ok, thanks.
Next question is about caching. In documentation it is not clear where Input and Output cache stores the cached data?
k
Input is not cached. Output is the result, which is the
.prefect
folder by default. Input cache was deprecated
v
so, Output caching and Result(checkpointing) is the same thing?
k
Not exactly, but when talking to people here they treat it synonymously. Caching is actually more about not running the task across Flow Runs while checkpointing is for restarting the same flow run from failure
v
But where Output Caching stores the cached data?
k
Still the Result because that is pulled for cached tasks. But the cache details (how long to cache for) are stored in the backend (Prefect Cloud)
v
I believe this documentation is outdated a little bit?
It explains that is totally different things
k
No no this is right. This is what I mean when I say checkpointing is different from caching
v
so, result arg is arg for both caching and checkpointing?
to enable checkpointing - checkpointing=True, to enable caching - set cache_for?
k
Think of it this way: 1. You can checkpoint a task. If the flow fails and restarts, you can load the checkpoint to resume (this has nothing to do with caching) 2. But if you start a new flow run, the checkpoints from the previous run don’t matter. Everything will run again. 3. So caching is the mechanism that lets you tell future runs not to run an expensive task again. 4. If a task is in a cached state, Prefect will pull the result if it exists in the future flow runs
You define result and caching independently. Yes to setting checkpointing and caching, but checkpointing is set to True for runs with Prefect Cloud by default. So it’s the other way around where you need to explicitly say
checkpointing=False
for the tasks you dont want to checkpoint. For local runs, checkpointing is turned off by default.
v
ok, now it is more clear
but still not clear how cache stores data and where
if I set S3Result and cache_for=1h
will it delete cached data from s3?
k
That is 2 parts. S3 Result is stored in the S3. The cache with one hour expiry is stored on Prefect Cloud. So Prefect never deletes data. When the same task runs again in the future, it will check the cache and if the cache flag is not valid anymore, it will run the task again. So the Cached state is something we store in Cloud or Server
v
now it makes sense
before you said that Output Caching is also stores in Result
probably I didn't get you
and last question - what does it mean that Input Caching is deprecated?
with enabled checkpointing by default there is no sense in this feature?
k
Sorry my responses were too short cuz there’s just a lot to respond to this morning haha. So there was this feature there we held onto the inputs of a task execution so that when you restart from failure, they are held in memory. I think this was just changed to Prefect knowing where to get them from instead. I don’t know exactly, I just found it reading the source code but it’s before my time
v
Thank you for your time!!!
k
Of course!
v
Sorry, one more thing. You said that checkpoint only for retries, but what if checkpoint target will exist for new run?
Will it ignore it?
k
If you mean this target. Target is a form of caching so that persists across flow runs. If you just do
Result(…, location=…)
, then this will not be respected
v
cool, thanks