Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.

Prefect Community

Question about memory usage since the system passes results back from tasks. If I have task A pull 100MB of data from a data store, then task B takes that 100MB and does some data transforms on it and returns 100MB of data, then task C does the same and returns 100MB.  Does that flow need 300MB to run?  If so how do you avoid exploding memory when using small tasks?

Hi Jeff, If you're calling `flow.run()` yourself, the results of all tasks will be stored on the output. For small results this is fine, but for larger things that could be problematic. You might write intermediates to disk and pass a filepath around if needed. This design was intentional, many users calling `flow.run` wish to inspect the output of all tasks in `flow.run`.

For flows run via cloud/server (registered via `flow.register()`) intermediate results won't be kept around in memory, so once an intermediate result is no longer needed in memory it will be dropped.

Ah ok that make sense.  That is super helpful.