Hello all slightly smiling face Is it possible to have dynam Prefect Community #ask-community

Hello all :slightly_smiling_face: Is it possible t...

Krzysztof Nawara

10/09/2020, 8:06 PM

Hello all 🙂 Is it possible to have dynamic cache keys? Currently they can be templated, so they are semi-dynamic, but I haven't found any way to make data that's passed through the pipeline part of the key. Usecase: caching mapped tasks where the order can be non-deterministic, so

prefect.context.map_index

might not be enough

nicholas

10/09/2020, 8:38 PM

Hi @Krzysztof Nawara - can you help explain a little further what you're trying to achieve? Maybe a small code sample might help as well

Krzysztof Nawara

10/09/2020, 9:58 PM

Hi @nicholas 🙂 So the upstream task produces list of files that I want to process - so I'm using mapped task. Now I want to cache results of those mapped tasks, but if the list grows/shrinks/changes, with cache keys that rely on

map_index

incorrect results are going to be read from cache. Does it make more sense now?

nicholas

10/09/2020, 10:21 PM

Definitely, thank you @Krzysztof Nawara! I would try something like this:

Copy code

@task(
  cache_key="some_global_key", # to share among mapped children
  cache_for=timedelta(days=1),
  cache_validator=partial_inputs_only(validate_on=['x', 'y']))
def add(x, y):
    return x + y

where you can validate the cache on something like the name of the file that you pass from the upstream task, and any other number of inputs

Krzysztof Nawara

10/10/2020, 8:11 AM

Does it mean that prefect will iterate over all matching cache entries (with the same cache key) until it finds the one for which validator returns True? And another question - how is that behaviour implemented under the hood? From what I have seen in the signature, cache_validator doesn't get access for input values in current execution, only for the previous ones?

Copy code

- state (State): a `Success` state from the last successful Task run that contains the cache
- inputs (dict): a `dict` of inputs that were available on the last
            successful run of the cached Task
- parameters (dict): a `dict` of parameters that were available on the
            last successful run of the cached Task

Krzysztof Nawara

10/10/2020, 3:34 PM

I decided to check the source code and I'm even more confused - while state indeed comes from cache, both inputs and parameters come from the current run of the flow, not cached one.

Krzysztof Nawara

10/10/2020, 3:35 PM

nicholas

10/12/2020, 5:28 PM

Hi @Krzysztof Nawara - I'm a little confused, are you running into a problem with caching?

Krzysztof Nawara

10/12/2020, 6:07 PM

Not really a problem, but I don't think I understand how cache validator is designed to work. It seems to me that the comments on the arguments are inconsistent with the implementation.

nicholas

10/12/2020, 6:07 PM

Hm, got it. That's definitely something you could PR where you see inconsistencies! I know there was a little work done on those docstrings recently since we added Results

Krzysztof Nawara

10/14/2020, 7:44 AM

@nicholas Makes sense, thanks for the help 🙂

👍 1

Open in Slack

Previous Next