Hello! If I have a flow that is successfully check-pointing results (with S3Result in my case), what's the best way to have the flow use the checkpoint in successive invocations as a cache?
k
Kevin Kho
07/09/2021, 1:52 PM
Hey @Hugo Shi, it sounds like you can use targets which is a file based caching. The result will be retrieved if the file exists and the task won’t be re-run. You can then use a filename that has a date or time in it to give a “lifespan” for that target. For example, using YYYY-MM-DD in the filename will make that task not re-run for a day.
h
Hugo Shi
07/09/2021, 1:54 PM
Thanks @Kevin Kho! do you know if the data from the target will be passed to downstream tasks?
k
Kevin Kho
07/09/2021, 1:57 PM
I think it will if returned by the cached task and then used by the downstream task like:
Copy code
a = cached_task()
b = downstream_task(a)
Kevin Kho
07/09/2021, 1:59 PM
Actually, how big is the data that you are storing? If it’s small and not sensitive, the KV Store is also an option. Edit: you probably want the target though.
Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.