Hi all Say that I have two tasks, A and B, that I...
# ask-community
t
Hi all Say that I have two tasks, A and B, that I both call through the
.map
interface. Is there a way to make task B wait on the corresponding item from task A to finish via the
wait_for=
argument?
Copy code
numbers = list(range(100))
a_futures = task_a.map(numbers)
b_futures = task_b.map(numbers, wait_for=a_futures)
My understanding is that in this case that task_b will wait for all items in the list provided to
wait_for
to finish before starting task b. This is a contrived example. Yes, I could change the signature for task_b to accept outputs form task_a. I have done this, but it feels smelly as a lot of the output is not needed.
n
hi @Tim Galvin - are you asking if its possible to have a situation where the `i`th future in
b_futures
would wait on the `i`th future from
a_futures
?
k
the way I've done this is iterate over the list in a for loop and
.submit
those two tasks in sequence, having B
wait_for
the future from A in each iteration. Maybe nate has a nicer answer though
t
Yes - that is exactly what I mean @Nate. The
.map
interface is very clean and makes life a breeze. I am just unsure what the best way to do things is. In this example it is trivial to 'make' task b accept some input from task a and ignore it, but for some more complex things it may be a little ugly
A loop over a
.submit
would be workable as well, but it can get ugly quick. If there was something like an
unmapped
bur for the
wait_for
that would be exactly what I am looking for
n
i think what kevin suggested is the simplest way to accomplish that. but perhaps you could get clever with submitting B tasks using as_completed to .submit B immediately after A’s futures .result calls complete do you have an example where the for loop with sequential submits gets ugly?
t
Not exactly, tbh. It just becomes a little annoying when I have something like this
Copy code
telescope_mss = glob("*.ms")
image_data = image_ms_data.map(ms=telescope_mss)
zip = zip_ms_data.map(ms=telescope_mss, wait_for=image_data)
This is kind of the situation I see myself lurking around. I love using the .map interface, it makes the code very readable and easy to follow. In the above situation I can not zip the measurement sets (these nasty radio-telescope formats that are folders in folders, many small files and many big files that HPCs absolutely hate) until the imaging is finished. Although I could pass the image_data future into the input of zip_ms, it would mean I either have to modify the return of
image_ms_data
and updated what
zip_ms
expects, or add a dummy arg input to make dependency that way. This type of use case pops up a little for me. As I try to convinces the powers the be that we have a workable maintainable solution I'd love to keep my usage of prefect as consistent as possible.
n
that makes a lot of sense. perhaps there’s a util we could explore for when you want to apply a series of tasks to the same input and have each future/result proceed independently 🧐 if you’d be up for an enhancement ticket, it’d certainly be appreciated, otherwise i’m happy to make one
t
Ill make one when i am in the office next 🙂
🙏 1