Short and sweet I hope But it isn t clear to me from the doc Prefect Community #ask-community

Short and sweet I hope. But it isn’t clear to me f...

Hawkar Mahmod

04/01/2021, 1:23 PM

Short and sweet I hope. But it isn’t clear to me from the docs where data is cached output data is persisted when using Cloud backend. I know locally it’s stored in memory, and you cannot use the cache locally unless you make use of a backend API, but does this mean that the output data is actually stored in the backend - Cloud or Core server? If so, does this not violate the principle that no data flow data should be stored on the API side?

emre

04/01/2021, 1:45 PM

Actual task outputs are stored somewhere on your infra. API side only stores where that somewhere is. Ex: task output stored as pickle in S3, API stores the s3 path:

<s3://your_bucket/your/path/to/pickle>

Check out results for more details: https://docs.prefect.io/core/concepts/results.html#results

Hawkar Mahmod

04/01/2021, 1:46 PM

But from what I can see you are supposedly able to use the caching functionality without setting any Result subclass on the task, so how does it know where to store this?

emre

04/01/2021, 1:49 PM

i dont think you can 😅 .

emre

04/01/2021, 1:50 PM

So caching docs says that the default cache is the

prefect.context

object, which is in memory, and therefore short lived. The persisting output section notes that you need a

Result

object to explicitly specify where and how your data will be stored.

Hawkar Mahmod

04/01/2021, 1:53 PM

It says that the cache is stored in context when running Prefect Core locally. What about when registering and running against the backend? That’s what’s not clear. The way the documentation is laid out seems to imply these are two different things, caching and persisting output. In fact the example given makes no mention of Result’s at all when demonstrating output caching. https://docs.prefect.io/core/concepts/persistence.html#output-caching

emre

04/01/2021, 3:18 PM

I see, the docs really are ambiguous in that sense. I've done some test runs on prefect server, and caching with only a

cache_for

and

cache_key

isn't good enough. First run notes that the cache is invalid, and runs tasks normally. Subsequent runs mark the task as cached, meaning a cache has been found, but passes

None

to downstream tasks, failing the flow. Adding a

result

parameter, specifically a

LocalResult

object made the non-cached task run persist its output, and subsequent runs used the cached value successfully. Here is what I think is going on: A

Result

merely exists as a way to persist task outputs to somewhere. It does not have to be involved with caching. For prefect core runs, the cache is simply

prefect.context

Result

configurations aren't involved here at all. For server/cloud runs, the cache is stored on the API side. But the cached value is not the data itself, but the

Result

objects location. If the server API determines that a task has a valid cache, the cached location is used, alongside your

Result

configuration, to retrieve the actual value of your data.

Zanie

04/01/2021, 3:25 PM

You're on the right track!

Results

bridge the gap between the API and your runtime environment. Since the API is designed to maintain separation from your data, we need a way to tell the API where the data is stored in your own infrastructure.

Zanie

04/01/2021, 3:27 PM

The output caching without a result (per the linked doc) is all within a single flow (because it is stored in memory, it is not persisted). This is helpful if a single task may be called multiple times in the same flow.

Jeremy Tee

04/02/2021, 1:42 AM

@Wai Kiat Tan

Hawkar Mahmod

04/06/2021, 7:48 AM

@emre thank you for that exploration and summary - very helpful. @Zanie thank you also. Is the output caching also used to retry flow runs?

Zanie

04/06/2021, 5:39 PM

When you use a result / checkpointing then they can be used for retries

Zanie

04/06/2021, 5:39 PM

Without including a result type, the output cache is ephemeral

2 Views

Open in Slack

Previous Next