Hi all Say that I have two tasks A and B that I both call th Prefect Community #ask-community

Hi all Say that I have two tasks, A and B, that I...

Tim Galvin

06/21/2024, 8:37 AM

Hi all Say that I have two tasks, A and B, that I both call through the

.map

interface. Is there a way to make task B wait on the corresponding item from task A to finish via the

wait_for=

argument?

Copy code

numbers = list(range(100))
a_futures = task_a.map(numbers)
b_futures = task_b.map(numbers, wait_for=a_futures)

My understanding is that in this case that task_b will wait for all items in the list provided to

wait_for

to finish before starting task b. This is a contrived example. Yes, I could change the signature for task_b to accept outputs form task_a. I have done this, but it feels smelly as a lot of the output is not needed.

Nate

06/21/2024, 2:22 PM

hi @Tim Galvin - are you asking if its possible to have a situation where the `i`th future in

b_futures

would wait on the `i`th future from

a_futures

Kevin Grismore

06/21/2024, 2:25 PM

the way I've done this is iterate over the list in a for loop and

.submit

those two tasks in sequence, having B

wait_for

the future from A in each iteration. Maybe nate has a nicer answer though

Tim Galvin

06/21/2024, 3:27 PM

Yes - that is exactly what I mean @Nate. The

.map

interface is very clean and makes life a breeze. I am just unsure what the best way to do things is. In this example it is trivial to 'make' task b accept some input from task a and ignore it, but for some more complex things it may be a little ugly

Tim Galvin

06/21/2024, 3:28 PM

A loop over a

.submit

would be workable as well, but it can get ugly quick. If there was something like an

unmapped

bur for the

wait_for

that would be exactly what I am looking for

Nate

06/21/2024, 4:41 PM

i think what kevin suggested is the simplest way to accomplish that. but perhaps you could get clever with submitting B tasks using as_completed to .submit B immediately after A’s futures .result calls complete do you have an example where the for loop with sequential submits gets ugly?

Tim Galvin

06/22/2024, 9:07 AM

Not exactly, tbh. It just becomes a little annoying when I have something like this

Copy code

telescope_mss = glob("*.ms")
image_data = image_ms_data.map(ms=telescope_mss)
zip = zip_ms_data.map(ms=telescope_mss, wait_for=image_data)

This is kind of the situation I see myself lurking around. I love using the .map interface, it makes the code very readable and easy to follow. In the above situation I can not zip the measurement sets (these nasty radio-telescope formats that are folders in folders, many small files and many big files that HPCs absolutely hate) until the imaging is finished. Although I could pass the image_data future into the input of zip_ms, it would mean I either have to modify the return of

image_ms_data

and updated what

zip_ms

expects, or add a dummy arg input to make dependency that way. This type of use case pops up a little for me. As I try to convinces the powers the be that we have a workable maintainable solution I'd love to keep my usage of prefect as consistent as possible.

Nate

06/22/2024, 4:18 PM

that makes a lot of sense. perhaps there’s a util we could explore for when you want to apply a series of tasks to the same input and have each future/result proceed independently 🧐 if you’d be up for an enhancement ticket, it’d certainly be appreciated, otherwise i’m happy to make one

Tim Galvin

06/23/2024, 3:37 AM

Ill make one when i am in the office next 🙂

🙏 1

Open in Slack

Previous Next