https://prefect.io logo
e

emre

01/05/2021, 1:44 PM
Hi everyone, I am trying to wrap my head around result caching šŸ˜… . On a core only run on my workstation, I keep failing to reuse my result on a long running task. My latest attempt is as follows:
Copy code
meta_df = SnowflakePandasResultTask(
            db=SNOW_DB,
            checkpoint=True,
            result=LocalResult(dir=".prefect_cache"),
            cache_for=timedelta(days=14),
            cache_key="snow_pandas_out",
        )(query=info_query)
This persist files with arbitrary names under
.prefect_cache
. On every run I get a warning that my cache is not valid anymore, Can anyone point me to where I am doing things wrong?
c

Chris White

01/05/2021, 4:07 PM
Hi @emre! Each time you run the flow containing this task, are you doing so from a new process?
e

emre

01/05/2021, 4:09 PM
I think so, I run from the terminal and every run builds the flow, runs it and then exits back to the terminal
c

Chris White

01/05/2021, 4:12 PM
ok gotcha - so when using
flow.run
alone, the storage of all previous cached runs occurs in memory; this means that if you call this from new processes they have no way of sharing information. However, there is a relatively simple workaround: all cached states from all tasks are stored in
prefect.context.caches
so if you save this after each run and load it before each run, it should start behaving as you expect. Something like:
Copy code
with open(".prefect_cache/THE_CACHE.pkl", "wb") as f:
    cloudpickle.dump(prefect.context.caches, f)

# on load
with open(".prefect_cache/THE_CACHE.pkl", "rb") as f:
    the_cache = cloudpickle.load(f)
    prefect.context.update(caches=the_cache)
e

emre

01/05/2021, 4:41 PM
Thanks @Chris White worked like a charm! Btw, this behavior built in would be very useful for me, I would like to see it as a feature. If it does sound ok to you, I want to try add it as an option to prefect core.
šŸ˜„ 1
c

Chris White

01/05/2021, 4:43 PM
In general for any sort of stateful work we recommend people use Prefect Server or Prefect Cloud, so I’d be hesitant to include this in Core alone — it will require more configuration for caching (where to store the cache), which is already a little confusing for folks
e

emre

01/05/2021, 5:12 PM
I see, makes sense šŸ™‚
4 Views