https://prefect.io logo
j

Jeff Quinn

10/03/2022, 5:13 PM
Hello all, love the project, one question: I find myself really wanting a feature flag to turn task level result caching on and off. My use case seems common enough: if I update my ETL code, I want to invalidate caching so that new results on existing input data are produced. Keying the cache on input argument hash doesn’t solve this use case; the input data is identical, it is my code that has changed. Time based cache expiration doesn’t fit this use case; I don’t know when my code will be updated, it might be tomorrow, it might be in a year. It seems like prefect 1.0 had some options in the API to serve this use case. What am I misunderstanding about prefect 2.0, which as far as I understand has no clear way to address this?
1
Oh wow now that I'm looking at the docs closer, maybe prefect 2 is inspecting the definition of tasks themselves and somehow creating a hash of the source code, and invalidating the cache if the source code changes? that is really incredible if I'm understanding it right..
Just tested it and this is how its working, it detects changes in the task source code, very cool. It even knows to ignore changes to comments, awesome!
🎉 1
Does seem to have some corner cases though, like in this case
Copy code
from prefect import flow, task
from prefect.tasks import task_input_hash

Z = {
  1: 1
}

@task(cache_key_fn=task_input_hash)
def task1():
  return Z[1]

@flow
def flow1():
  return task1()
If the values in some mutable data structure change, the cache isnt expired..
Ah theres actually a ton of corner cases.. I really think I do need a way to manually invalidate cache. Alot of my code updates wont be detected by this system
m

Mason Menges

10/03/2022, 5:59 PM
You Might be able to implement implement a conditional check around the edge cases you're seeing and then run the task with_options to run the task without the cache in else run with the default configuration so something like this I think
j

Jeff Quinn

10/03/2022, 6:00 PM
Ok ill check out the with_options api
Im really surprised at some of the things that are vs arent picked up
for example if you change a function call within the task that is not picked up
Copy code
Z = 2
def calc():
  return 1
def calc2():
  return 2

@task(cache_key_fn=task_input_hash)
def task4():
  return calc()

@flow
def flow4():
  return task4()
changing between calc() and calc2() on line 9 here doesnt invalidate cache (!)
ah yeah using
with_options
i can avoid a cache hit, but the new result itself wont be cached, and the old cached value will remain in db
m

Mason Menges

10/03/2022, 6:05 PM
Yep, also for anything that stands out to you in regards to changes that might be useful you can always open a feature request here https://github.com/PrefectHQ/prefect/issues/new/choose 😄
j

Jeff Quinn

10/03/2022, 6:08 PM
Sure ill summarize there
🙌 1
m

Mason Menges

10/03/2022, 6:08 PM
we're also working on pushing out configurable results, not here yet but hopefully soon, which may be helpful in this case as well depending on what the tasks are returning in the future you could choose to utilize results storage to reference previous task run_outputs for comparison as well though it depends on your use case.
j

Jeff Quinn

10/03/2022, 6:16 PM
3 Views