Hello everyone We ve started working with prefect in our tea Prefect Community #ask-community

Hello, everyone. We've started working with prefec...

Matheus Calvelli

02/23/2021, 12:46 PM

Hello, everyone. We've started working with prefect in our team and somethings about caching dont seem to add up for me. Could someone help me understand? According to this issue: https://github.com/PrefectHQ/prefect/issues/1221 it seems prefect does not allow persistent caching of tasks to be used during different runs but it should allow for scheduled runs (there is also this issue which talks about it: https://github.com/PrefectHQ/prefect/pull/1226). However, this does not seem to be the case either. I tried caching the results of a couple of tasks and schedule the flow to be run again a few minutes later and the cached tasks didnt work, which in turn made the whole flow not work. Could someone explain how exactly caching works? And, is there a way through which i can "backup" results of tasks in order to iterate over models (which is what i wanted to do with cached tasks in the first place)?

Jenny

02/23/2021, 2:03 PM

Hi @Matheus Calvelli - looks like you've done some good research! The docs on caching should help here: https://docs.prefect.io/core/concepts/persistence.html#input-caching I think output caching and cache_key should help you.

Matheus Calvelli

02/23/2021, 2:46 PM

Hi @Jenny, thank you for the reply. I did read those docs but i couldnt manage to get output caching to work, both regularly and with cache_keys. Even though the agent returned successful cached messages as output the next flow runs, which depended on those cached tasks, all failed. From what i could tell the cache_validator returned an OK status - as it should - but the tasks couldnt find/get the data as expected. Am i missing something? Are there any specific server configurations to make caching work? Any help would be much appreciated. Note: I did manage to get result checkpoints to work, though. So maybe the intended behavior is that we should use result to do those things and caching only for tasks that will see multiple uses across a single flow or a collection of flows that are executed from the same script? Thanks in advance.

2 Views

Open in Slack

Previous Next