Hi @Michael Wooley -- you can't yield data from a task right now (and I do not think Dask will make it easy to implement). I think your best option is writing the data to files then passing a reference to the files to your mapped transform as you mentioned. If you pass the data itself instead of a pointer then you will almost certainly run into memory constraints.
Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.