operator coming? That is a big stumbling block for us adopting 2.0. Thanks!
k
Kevin Kho
07/20/2022, 9:24 PM
Why doesn’t a for loop suffice for your use case? Map would probably be a bit further because it would need to integrate with Ray/Dask
t
Tim Enders
07/20/2022, 9:26 PM
The for loop doesn't seem to be actually parallel in my testing. and that is what we use
map
for. to parallelize across a local Dask cluster. We have some flows that have to make a lot of API calls and they already take an hour or so in 5x parallel
Tim Enders
07/20/2022, 9:27 PM
by alot I mean 10s of Thousands. We are trying to get a reporting API, but it isn't a priority on the engineering side
k
Kevin Kho
07/20/2022, 9:28 PM
Ah ok i know what you mean but you can try list comprehension:
Copy code
[task_one(x) for x in items]
this will be parallel compared to the for loop because for loop makes it easy to block execution
t
Tim Enders
07/20/2022, 9:29 PM
hmm, thank you. I will look into it!
k
Kevin Kho
07/20/2022, 9:31 PM
Copy code
for item in items:
a = task_one(item)
b = task_two(a).result()
will wait for a to complete from the previous loop iteration i think because of the result call. Very easy to run into this
Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.