Hey, how to do nested mapping?
# ask-community
i
Hey, how to do nested mapping?
r
Hi, @Issam Assafi! I not sure, but it seems like nested mapping is not supported. Check this question: https://prefect-community.slack.com/archives/CL09KU1K7/p1628064350289600
i
I see, i have this piece of code :
Copy code
page_path_pdf_pairs = get_pages_from_pdf(pdf_path)
        texts_of_pages = get_text_from_page.map(page_path_pdf_pairs)
        ocr_pages = join_ocr_pages(texts_of_pages)
but in the code above it's applied only to 1 pdf, and i have multiple PDFs ... any idea how i can handle this while gaining parallelisationadvantage?
e
You can use
flatten
to, well, flatten your nested lists into a single list, which will then be mappable.
Copy code
from prefect import flatten

with Flow('x') as f:
    page_path_pdf_pairs = get_pages_from_pdf.map(pdf_paths)
    texts_of_pages = get_text_from_page.map(flatten(page_path_pdf_pairs))
    ocr_pages = join_ocr_pages(texts_of_pages)
Be careful, by using flatten you are losing the nested structure, so you might lose the information of which page belongs to which pdf. Tag your pages with their source pdf names wherever possible, so you can group the pages back.
k
Nested mapping is not a thing because the first level map already uses the available resources, so the inner map wouldn’t really do anything. Emre’s suggestion might be the way to go.
w
Well.. that kinda depends on how many items are in the first level map
I'm trying to do something similar: doing a 2-step reduce..
k
Ah I see what you mean. If you share a code snippet, I could try and help you get it to one map?
w
not really code, but:
todo = { 'a': [1,2,3], 'b': [4,5,6], 'c': [7,8,9] }
I want to run a calculation on every number ; then do a first level of reduce on every row (a, b, c)
and then reduce the a,b,c results a final time
I can model that by doing a 'flatten' for the number calculations and then do a map on the keys aferwards
that would work but feels like a workaround
k
It certainly is. I have been looking at the mapping code lately myself. Will keep this in mind. Are you on a DaskExecutor?
w
Probably we'll use Dask on ECS.. not sure yet