Got an Orion specific question: Just trying to see how can we move from the Prefect Server to Orion and I see it supports the caching but was wondering if there is any plan to offer equivalent of Prefect Server "checkpointing" that ensures that every time a task is successfully run, its return value is written to persistent storage and it is been cached for subsequent run.
k
Kevin Kho
11/12/2021, 2:33 PM
Hey @Vipul, will ask the team about this
v
Vipul
11/12/2021, 3:11 PM
Thanks @Kevin Kho it would be good to know as most of our task are cpu intensive and so we would like to perists the result in a way that any subsequent result would come from the cache rather re-compute
z
Zanie
11/12/2021, 3:37 PM
The short answer is: yes, there's task run caching. See
cache_key_fn
and
cache_expiration
https://orion-docs.prefect.io/api-ref/prefect/tasks/
The longer answer is: not quite yet. We've established the building blocks for this, but have not made it straightforward to use beyond trivial cases. I believe that you can set up persistence of task run data to the Orion server's local file system, but we have not done thorough testing of this yet. We'll likely flesh this out as a milestone and release improvements / documentation.
b
Brad
11/12/2021, 11:52 PM
+1 to this feature
➕ 1
v
Vipul
11/13/2021, 1:27 PM
Thanks @Zanie it would be good to add this feature as we really like the persisting the results file. And we can copy this from our PROD to DEV and replay the exact failure we see in our PROD env
z
Zanie
11/13/2021, 3:48 PM
Yep! This is absolutely going to be a feature in Orion and it will take into account everything we've learned about checkpointing so far.
b
Brad
11/13/2021, 10:48 PM
Hey @Zanie - one immediate (and simple!) improvement here would be to use the excellent fsspec (https://github.com/fsspec/filesystem_spec) library which would allow many backends rather than just the local filesystem. From a quick look at the data doc/location source it shouldn’t be that large of a change - is this on the radar at all ?
Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.