https://prefect.io logo
Title
v

Vadym Dytyniak

03/08/2022, 9:51 AM
Hi. How the default
checkpointing
works in prefect 1.0.0? In documentation I see that it is enabled by default, but what is the default
Result
?
k

Kevin Kho

03/08/2022, 2:18 PM
The default is a LocalResult that is Cloudpickled
v

Vadym Dytyniak

03/08/2022, 2:19 PM
@Kevin Kho Ok, thanks.
Next question is about caching. In documentation it is not clear where Input and Output cache stores the cached data?
k

Kevin Kho

03/08/2022, 2:23 PM
Input is not cached. Output is the result, which is the
.prefect
folder by default. Input cache was deprecated
v

Vadym Dytyniak

03/08/2022, 2:28 PM
so, Output caching and Result(checkpointing) is the same thing?
k

Kevin Kho

03/08/2022, 2:30 PM
Not exactly, but when talking to people here they treat it synonymously. Caching is actually more about not running the task across Flow Runs while checkpointing is for restarting the same flow run from failure
v

Vadym Dytyniak

03/08/2022, 2:31 PM
But where Output Caching stores the cached data?
k

Kevin Kho

03/08/2022, 2:33 PM
Still the Result because that is pulled for cached tasks. But the cache details (how long to cache for) are stored in the backend (Prefect Cloud)
v

Vadym Dytyniak

03/08/2022, 2:34 PM
I believe this documentation is outdated a little bit?
It explains that is totally different things
k

Kevin Kho

03/08/2022, 2:36 PM
No no this is right. This is what I mean when I say checkpointing is different from caching
v

Vadym Dytyniak

03/08/2022, 2:43 PM
so, result arg is arg for both caching and checkpointing?
to enable checkpointing - checkpointing=True, to enable caching - set cache_for?
k

Kevin Kho

03/08/2022, 2:45 PM
Think of it this way: 1. You can checkpoint a task. If the flow fails and restarts, you can load the checkpoint to resume (this has nothing to do with caching) 2. But if you start a new flow run, the checkpoints from the previous run don’t matter. Everything will run again. 3. So caching is the mechanism that lets you tell future runs not to run an expensive task again. 4. If a task is in a cached state, Prefect will pull the result if it exists in the future flow runs
You define result and caching independently. Yes to setting checkpointing and caching, but checkpointing is set to True for runs with Prefect Cloud by default. So it’s the other way around where you need to explicitly say
checkpointing=False
for the tasks you dont want to checkpoint. For local runs, checkpointing is turned off by default.
v

Vadym Dytyniak

03/08/2022, 2:48 PM
ok, now it is more clear
but still not clear how cache stores data and where
if I set S3Result and cache_for=1h
will it delete cached data from s3?
k

Kevin Kho

03/08/2022, 2:50 PM
That is 2 parts. S3 Result is stored in the S3. The cache with one hour expiry is stored on Prefect Cloud. So Prefect never deletes data. When the same task runs again in the future, it will check the cache and if the cache flag is not valid anymore, it will run the task again. So the Cached state is something we store in Cloud or Server
v

Vadym Dytyniak

03/08/2022, 2:51 PM
now it makes sense
before you said that Output Caching is also stores in Result
probably I didn't get you
and last question - what does it mean that Input Caching is deprecated?
with enabled checkpointing by default there is no sense in this feature?
k

Kevin Kho

03/08/2022, 2:57 PM
Sorry my responses were too short cuz there’s just a lot to respond to this morning haha. So there was this feature there we held onto the inputs of a task execution so that when you restart from failure, they are held in memory. I think this was just changed to Prefect knowing where to get them from instead. I don’t know exactly, I just found it reading the source code but it’s before my time
v

Vadym Dytyniak

03/08/2022, 2:59 PM
Thank you for your time!!!
k

Kevin Kho

03/08/2022, 3:05 PM
Of course!
v

Vadym Dytyniak

03/08/2022, 3:25 PM
Sorry, one more thing. You said that checkpoint only for retries, but what if checkpoint target will exist for new run?
Will it ignore it?
k

Kevin Kho

03/08/2022, 3:27 PM
If you mean this target. Target is a form of caching so that persists across flow runs. If you just do
Result(…, location=…)
, then this will not be respected
v

Vadym Dytyniak

03/08/2022, 3:28 PM
cool, thanks