Brian McFeeley
07/25/2019, 3:30 PM@task
def extract():
return [1, 2, 3]
@task
def transform1(x):
return [x, x * 2, x * 3]
@task
def load(x):
print(f"output: {x}")
if __name__ == '__main__':
with Flow("mapping test") as flow:
e = extract()
t1 = transform1.map(e)
l = load(t1)
flow.run()
This works like we expect and generates a new list for each of the items in the original list, with output [[1, 2, 3], [2, 4, 6], [3, 6, 9]]
.
However, let's say I wanted to then map over each of the items in each of these sublists. If I add a new task:
@task
def transform2(x):
return x * 2
And then change my flow to:
e = extract()
t1 = transform1.map(e)
t2 = transform2.map(t1)
l = load(t2)
I see that the second tasks received the lists as a whole in their mapped execution, rather than mapping over the internal scalars, so the output is [[1, 2, 3, 1, 2, 3], [2, 4, 6, 2, 4, 6], [3, 6, 9, 3, 6, 9]]
i.e. we duplicated the list instead of multiplying each element by 2.
An example from real life here is maybe:
• generate or grab list of s3 buckets
• get list of files in each bucket
• process each file individually
Any ideas? My docs search is coming up dry.Chris White
07/25/2019, 3:33 PM[1, 2, 3]
- maps over that list to create three lists [1, 2, 3]
, [2, 4, 6]
, etc.
- reduces that to a single list-of-lists [[1, 2, 3], [2, 4, 6], [3, 6, 9]]
- maps over that list-of-lists, which has the action of multiplying each sub-list by 2 (which duplicates)Brian McFeeley
07/25/2019, 3:50 PMJeremiah
07/25/2019, 4:02 PMflatten
the list so it could be trivially mapped over
- We could introduce some sort of “access” to the map()
call, which would allow us to define custom ways of accessing th emapped object. Your list-of-lists is one example, but mapping over a dictionary (accessing values/items vs keys) could be anotherBrian McFeeley
07/25/2019, 4:09 PMChris White
08/05/2019, 9:50 PMMarvin
08/05/2019, 9:50 PM