Gol Bahar

06/21/2022, 2:14 AM
Hello, I have recently been reading the docs on Perfect 2.0, and had a few questions. • There is a mention of Storage in docs, but no examples on how it actually fits the big picture. What’s the best way to handle passing larger-than-memory data between tasks/flows? Does Storage fit here? • Let’s say I have a flow A doing preprocess, train and test, and now I want the flow B to be exactly like A except with a different training Task/Flow. Is there a way to only override parts of a Flow? Thank you.

Kevin Kho

06/21/2022, 4:39 AM
Storage is for storing a Flow code. I don’t think we have something to support larger-than-memory data. How would you do it in Python without Prefect? We do support execution on Dask though and Dask has out-of-memory capabilities to some extent. The results of tasks are passed in memory to other task. At the moment though, all task outputs are checkpointed and persisted. This will be more configurable in the near future but that is not related to Storage. That is called Results. It just so happens right now that Results use the default Storage, but the interface for it will be released before general availability. For the second question, there many ways to do this. I would suggest you just do it as you would with normal Python: 1. you can have if-else inside flows to change the task/sub-flow 2. you can use parameters and pick the right model to use inside a task?
:upvote: 1
👍 1