https://prefect.io logo
j

jeff n

01/05/2021, 7:21 PM
Question about memory usage since the system passes results back from tasks. If I have task A pull 100MB of data from a data store, then task B takes that 100MB and does some data transforms on it and returns 100MB of data, then task C does the same and returns 100MB. Does that flow need 300MB to run? If so how do you avoid exploding memory when using small tasks?
j

Jim Crist-Harif

01/05/2021, 7:28 PM
Hi Jeff, If you're calling
flow.run()
yourself, the results of all tasks will be stored on the output. For small results this is fine, but for larger things that could be problematic. You might write intermediates to disk and pass a filepath around if needed. This design was intentional, many users calling
flow.run
wish to inspect the output of all tasks in
flow.run
. For flows run via cloud/server (registered via
flow.register()
) intermediate results won't be kept around in memory, so once an intermediate result is no longer needed in memory it will be dropped.
j

jeff n

01/05/2021, 7:48 PM
Ah ok that make sense. That is super helpful.
2 Views