https://prefect.io logo
Title
t

Tim-Oliver

01/05/2023, 4:36 PM
Hi, With result persistence on, is it possible to trigger a task-rerun if the persisted result got deleted or corrupted? I am thinking about a case where the cache-key for the orion-DB is available, but the result can't be retrieved from local storage. In such a case I would like to invalidate the cache-key and trigger a new task-run to obtain the missing/corrupted result.
m

Mason Menges

01/05/2023, 8:18 PM
Hey @Tim-Oliver I don't think I've run into anything like this personally but I would think task-level retries would accomodate this, though it kinda depends on what you mean by task re-run. Are you triggering a retry of the flow run from the UI?
t

Tim-Oliver

01/06/2023, 8:10 AM
I just tried it out with retries, but it does not work.
My understanding of task-retries is that a task-run is tried again, when the task fails due to some internal task error e.g. connection timeout to a DB.
The pattern I am trying to solve is: 1. Submit flow-run and wait for completion. 2. Go to storage and delete one of the persisted results. 3. Submit another flow-run with the same parameters as in 1. Now what I would like to achieve is that the 2nd flow run only recomputes the missing result and all other task-runs are skipped due to the result being cached.
However, what I see (with and without retry on) is that the flow is executed and the result is indicated as cached in the UI, but it can't be retrieved from local storage because it got deleted.
It does work, when I go into the first flow-run and delete the task-run. I guess this will remove the entry from the DB and does not even try to load the local storage result.
I think I am looking for a function which is called after the validation of the cache-key to validate if the local storage is present. And only if both cache-key and local storage are good the cached result is used.
1
I am also happy to open a git-issue 🙂
I think I could get there by using a special cache_key_fn which first computes the cache_key and then tries to access the local storage. If the file is found the normal cache_key is returned, otherwise some other unique cache_key is returned. Thought a bit more and concluded that this will not work that easily. Because I would have to recreate the unique cache_key in later runs again, if the re-computed result should be reused.
Now I just have to figure out how I can get the local storage results.
My current understanding of how caching works: 1. cache_key_fn is called with TaskRunContext and input-arguments. a. cache_key is computed 2. Orion looks up if cache_key exists. a. If cache_key exists, state is set to completed (cached) b. otherwise task is marked to be executed
Would the following work: 1. cache_key_fn is called with TaskRunContext and input-arguments a. compute cache_key b. Check via REST API if the cache_key exists c. If the cache_key exists retrieve the result-state via REST API d. Check if the local storage results is valid i. If valid, return cache_key ii. otherwise, delete existing task-run via REST API and then return cache_key
d

Deceivious

03/06/2023, 1:10 PM
Did anyone actually solve this isses?
t

Tim-Oliver

03/06/2023, 1:26 PM
Haven´t heard back and have not tried it out. But I would be interested if this works.