https://prefect.io logo
k

Krzysztof Nawara

10/09/2020, 8:06 PM
Hello all 🙂 Is it possible to have dynamic cache keys? Currently they can be templated, so they are semi-dynamic, but I haven't found any way to make data that's passed through the pipeline part of the key. Usecase: caching mapped tasks where the order can be non-deterministic, so
prefect.context.map_index
might not be enough
n

nicholas

10/09/2020, 8:38 PM
Hi @Krzysztof Nawara - can you help explain a little further what you're trying to achieve? Maybe a small code sample might help as well
k

Krzysztof Nawara

10/09/2020, 9:58 PM
Hi @nicholas 🙂 So the upstream task produces list of files that I want to process - so I'm using mapped task. Now I want to cache results of those mapped tasks, but if the list grows/shrinks/changes, with cache keys that rely on
map_index
incorrect results are going to be read from cache. Does it make more sense now?
n

nicholas

10/09/2020, 10:21 PM
Definitely, thank you @Krzysztof Nawara! I would try something like this:
Copy code
@task(
  cache_key="some_global_key", # to share among mapped children
  cache_for=timedelta(days=1),
  cache_validator=partial_inputs_only(validate_on=['x', 'y']))
def add(x, y):
    return x + y
where you can validate the cache on something like the name of the file that you pass from the upstream task, and any other number of inputs
k

Krzysztof Nawara

10/10/2020, 8:11 AM
Does it mean that prefect will iterate over all matching cache entries (with the same cache key) until it finds the one for which validator returns True? And another question - how is that behaviour implemented under the hood? From what I have seen in the signature, cache_validator doesn't get access for input values in current execution, only for the previous ones?
Copy code
- state (State): a `Success` state from the last successful Task run that contains the cache
- inputs (dict): a `dict` of inputs that were available on the last
            successful run of the cached Task
- parameters (dict): a `dict` of parameters that were available on the
            last successful run of the cached Task
I decided to check the source code and I'm even more confused - while state indeed comes from cache, both inputs and parameters come from the current run of the flow, not cached one.
n

nicholas

10/12/2020, 5:28 PM
Hi @Krzysztof Nawara - I'm a little confused, are you running into a problem with caching?
k

Krzysztof Nawara

10/12/2020, 6:07 PM
Not really a problem, but I don't think I understand how cache validator is designed to work. It seems to me that the comments on the arguments are inconsistent with the implementation.
n

nicholas

10/12/2020, 6:07 PM
Hm, got it. That's definitely something you could PR where you see inconsistencies! I know there was a little work done on those docstrings recently since we added Results
k

Krzysztof Nawara

10/14/2020, 7:44 AM
@nicholas Makes sense, thanks for the help 🙂
👍 1