HI #prefect-community I have task mapping on a list of images (~12000) :
out = func.map([A1, A2, A3, A4, A5, B1, B2, B3, B4, B5])
in the next step I would like to partially reduce the output and combine only the matching subgroups ex combining 
A = out[0:5] B =out[5::]
and then process in parallel 
 and`B`. I have three questions: (1) If I understood correctly order matters for mapping in prefect so input and output have the same order, correct? (2) I am running the code on a HPC. If I proceed this way will the entire 
 be collected in memory or the different output groups dispatched to the specific worker where the reduce is happening? (3) Is there a more efficient way to do this? thanks a lot!
Hi Simone, Here are some control flow resources: https://docs.prefect.io/core/task_library/control_flow.html
I don’t think it’s possible to take
because the result of the map doesn’t instantiate until runtime
You might be able to use
from the control flow utilities instead
Usually when I run into memory problems in this way, I store the data in the cloud (S3 or GCS) and then pass around a list of references to data in cloud storage
Thanks a lot! I will look into filtering. i guess work case scenario i will have ~12000 files
You can always clean then up as part of your flow!
so just out of curiosity you do not thing that this much io will affect the speed of the processing? The flow runs on a SSD drive
It definitely will
You’re making a tradeoff between i/o, storage, and memory
If you need this to run under a certain time, then increasing the available memory and passing the whole list of images will be faster
If you can spare some time & disk, then writing the images to disk and passing references decreases the total memory use but increases i/o time
You can get even cleverer with Dask & arrays and whatnot
The DaskExecutor will help with some of this out-of-the-box, but ultimately the whole array will be in memory at some point I believe
great! thanks a lot for the thorough explanation! really appreciated! I will play around and see what is the best solution for my application.
