I've got a question about nested mapping. Conside...
# prefect-community
b
I've got a question about nested mapping. Consider a trivial example:
Copy code
@task
def extract():
    return [1, 2, 3]

@task
def transform1(x):
    return [x, x * 2, x * 3]

@task
def load(x):
    print(f"output: {x}")


if __name__ == '__main__':
    with Flow("mapping test") as flow:
        e = extract()
        t1 = transform1.map(e)
        l = load(t1)
    flow.run()
This works like we expect and generates a new list for each of the items in the original list, with output
[[1, 2, 3], [2, 4, 6], [3, 6, 9]]
. However, let's say I wanted to then map over each of the items in each of these sublists. If I add a new task:
Copy code
@task
def transform2(x):
    return x * 2
And then change my flow to:
Copy code
e = extract()
        t1 = transform1.map(e)
        t2 = transform2.map(t1)
        l = load(t2)
I see that the second tasks received the lists as a whole in their mapped execution, rather than mapping over the internal scalars, so the output is
[[1, 2, 3, 1, 2, 3], [2, 4, 6, 2, 4, 6], [3, 6, 9, 3, 6, 9]]
i.e. we duplicated the list instead of multiplying each element by 2. An example from real life here is maybe: • generate or grab list of s3 buckets • get list of files in each bucket • process each file individually Any ideas? My docs search is coming up dry.
c
This is ultimately because multiplying a list by 2 duplicates that list --> mapping maps over each element of the iterable (whatever it may be). So in this case your pipeline: - generates a list
[1, 2, 3]
- maps over that list to create three lists
[1, 2, 3]
,
[2, 4, 6]
, etc. - reduces that to a single list-of-lists
[[1, 2, 3], [2, 4, 6], [3, 6, 9]]
- maps over that list-of-lists, which has the action of multiplying each sub-list by 2 (which duplicates)
b
OK, I think I see what you're saying. So it sounds like the workflow i imagined would maybe still have a loop to work on each item in the nested iterable. Am I maybe thinking about this from a non-idiomatic perspective for prefect? Can i fan out more than once or in general is that not a thing?
j
Couple things come to mind here: - a reduce step in between the maps could
flatten
the list so it could be trivially mapped over - We could introduce some sort of “access” to the
map()
call, which would allow us to define custom ways of accessing th emapped object. Your list-of-lists is one example, but mapping over a dictionary (accessing values/items vs keys) could be another
But @Brian McFeeley you’re correct that we don’t have a native “nested” map operation!
(at the moment 😉 )
b
the flattening sounds reasonable to me, there's no big reason they need to be distinctly grouped
c
@Marvin archive “Question about Nested mapping”