Thread
#prefect-community
    emre

    emre

    1 year ago
    Hi everyone, I am trying to wrap my head around result caching šŸ˜… . On a core only run on my workstation, I keep failing to reuse my result on a long running task. My latest attempt is as follows:
    meta_df = SnowflakePandasResultTask(
                db=SNOW_DB,
                checkpoint=True,
                result=LocalResult(dir=".prefect_cache"),
                cache_for=timedelta(days=14),
                cache_key="snow_pandas_out",
            )(query=info_query)
    This persist files with arbitrary names under
    .prefect_cache
    . On every run I get a warning that my cache is not valid anymore, Can anyone point me to where I am doing things wrong?
    Chris White

    Chris White

    1 year ago
    Hi @emre! Each time you run the flow containing this task, are you doing so from a new process?
    emre

    emre

    1 year ago
    I think so, I run from the terminal and every run builds the flow, runs it and then exits back to the terminal
    Chris White

    Chris White

    1 year ago
    ok gotcha - so when using
    flow.run
    alone, the storage of all previous cached runs occurs in memory; this means that if you call this from new processes they have no way of sharing information. However, there is a relatively simple workaround: all cached states from all tasks are stored in
    prefect.context.caches
    so if you save this after each run and load it before each run, it should start behaving as you expect. Something like:
    with open(".prefect_cache/THE_CACHE.pkl", "wb") as f:
        cloudpickle.dump(prefect.context.caches, f)
    
    # on load
    with open(".prefect_cache/THE_CACHE.pkl", "rb") as f:
        the_cache = cloudpickle.load(f)
        prefect.context.update(caches=the_cache)
    emre

    emre

    1 year ago
    Thanks @Chris White worked like a charm! Btw, this behavior built in would be very useful for me, I would like to see it as a feature. If it does sound ok to you, I want to try add it as an option to prefect core.
    Chris White

    Chris White

    1 year ago
    In general for any sort of stateful work we recommend people use Prefect Server or Prefect Cloud, so Iā€™d be hesitant to include this in Core alone ā€” it will require more configuration for caching (where to store the cache), which is already a little confusing for folks
    emre

    emre

    1 year ago
    I see, makes sense šŸ™‚