Yannick
01/04/2021, 10:08 AMcloudpickle
so that Dask workers can pass the data. Does that mean that data passing between the tasks is in the hands of Dask completely? Meaning that the data is passed over the network in case Dask schedules the tasks to not be on the same machine in the cluster?
• About large data from the docs: "Don't worry about passing large data objects between tasks. As long as it fits in memory, Prefect can handle it with no special settings." What exactly should fit in memory here, the sum of all output data in the flow or is there some sort of eviction going on? Example: when building a flow like: A --> B and A --> C, and B --> D, should the output from A + output from B fit completely in memory? Secondly, from the docs: "(...) If it doesn't, there are ways to distribute the operation across a cluster.", how would I go about doing such a thing?
• For Input Caching, is there any way to configure how this works as it states: "Input caching is an automatic caching." since I would like additional control over input caching.
Many, many thanks! 🙏Kyle Moon-Wright
01/04/2021, 7:19 PMYannick
01/05/2021, 7:33 AMKyle Moon-Wright
01/05/2021, 4:51 PMcache_for
duration for future Flow Runs.Yannick
01/05/2021, 5:19 PM