Ben Epstein

02/11/2021, 2:55 PM
After reading the “Why not Airflow” post for the 50th time, I get more and more convinced of Prefect - especially on dataflow. But I’m looking at the docs and cannot seem to find limitations on dataflow. What are the limits of how much data can be passed between tasks in a flow? And are there docs/further reading on where/how is that data being persisted?


02/11/2021, 4:04 PM
Hi @Ben Epstein. The limits on how much data can be passes are largely memory focused. The result of each task is persisted in memory on the dask worker until the entire flow has completed. This allows us to quickly pass task results from task to task and retrieve them at the end of a flow run. If checkpointing is enabled, the task results are also persisted to a location that will exist after the flow has finished running. This allows resumption from cached values on a future flow run.
👍 1
See for more details on results/persistence

Ben Epstein

02/11/2021, 4:08 PM
Awesome, thanks! Just what i was looking for