Hello guys, we create flows to deal with big time series of GeoTIFF rasters (treated as xarrays). What do you suggest about passing massive data between tasks? Is better to store those files on S3, EFS temporarily or treat them as a regular file and return them from functions?
z
Zanie
02/01/2021, 4:13 PM
Hi @Giovanni Giacco — you’ll almost certainly run into memory issues if passing large data objects between tasks because they’ll be persisted as flow results. I’d recommend passing a file reference.
g
Giovanni Giacco
02/01/2021, 4:19 PM
Thank you Michael for the suggestion. We (me and @Marco Petrazzoli) will proceed with a file reference.
Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.