https://prefect.io logo
a

Andrey Tatarinov

12/14/2020, 7:14 PM
Question about caching behaviour: I'm in the process of active development of a certain flow. Flow runs with K8s agent packaged in Docker. There's one task that takes a lot of time. Task is decorated with
result=GCSResult
and
cache_for=timedelta(hours=1)
. I notice, that when I'm not rebuilding docker image Prefect is respecting cache, i.e. second run goes much faster than first. But it seems that each rebuild of an image invalidates cache. Q: is it true? How can I get more insight on how caching works?
k

Kyle Moon-Wright

12/14/2020, 7:46 PM
Hey @Andrey Tatarinov, If I’m not mistaken, the cache should be respected for your subsequent runs despite rebuilding the image for that result type. Did you specify a
cache_validator
at all? There are a variety of cache_validators available to check the validity of your result which may be of some interest to you, if the subsequent run had different parameters for example - the cache won’t be respected (and we can use
cache_validator= prefect.engine.cache_validators.all_parameters
rather than the default).
a

Andrey Tatarinov

12/14/2020, 7:52 PM
I did not specify cache validator, and I expected that
duration_only
would be used.
How do I debug this behaviour?
🤔 1
Just confirmed it again: rerunning without re-registering is utilising cache, re-registering results in running as if there are no cached results.
k

Kyle Moon-Wright

12/14/2020, 8:04 PM
IIRC, a newly registered flow to Prefect Cloud creates its own history with that image/storage/configurations, which would also correspond to a new result cache for that registered flow version. So that first run after a fresh registration would initiate the cache for subsequent runs… of that version of your flow.
Can you confirm that on the GCS side perhaps?
a

Andrey Tatarinov

12/14/2020, 8:24 PM
What should I check?
k

Kyle Moon-Wright

12/14/2020, 8:50 PM
Hmm, perhaps trying a run with a LocalResult and cache is a good place to start - checking to see whether the cache is respected for the duration across all versions of your flow or if it resets based on a new version. If a new result is created, then the cache only corresponds with a flow’s Active version and its configurations, in which case the result cache is live only for its flow version’s specified duration.
8 Views