Hi there everyone I m trying to understand how to do simple Prefect Community #ask-community

Hi there everyone, I’m trying to understand how to...

Jan Domanski

05/08/2022, 1:30 PM

Hi there everyone, I’m trying to understand how to do simple map tasks with prefect2

Copy code

@task
def generate_numbers():
    return [1, 2, 3, 4]

@task 
def compute_sth_expensive(number):
    return number ** 2

@flow
def pipeline():
    result_generate_numbers = generate_numbers()
    results = map(compute_sth_expensive, result_generate_numbers)
    for r in results: r.result() ## ??

Is that an acceptable pattern? I want to do a parallel calculation over the

result_generate_numbers

and then perform some gather-like operation

Anna Geller

05/08/2022, 1:48 PM

Hi Jan! This page provides an example

Anna Geller

05/08/2022, 1:49 PM

Mapping is on the roadmap - for now, you can solve this by using a for-loop and attaching a Dask, Ray or concurrent task runner to your flow

Jan Domanski

05/08/2022, 1:59 PM

Thank you for your answer! But doesn’t this suffer from the performance problem described here? https://github.com/PrefectHQ/prefect/issues/5653

Anna Geller

05/08/2022, 2:03 PM

This issue is only to investigate potential extra performance improvements, the logic itself and parallel execution are working fine. Can you say more about the problem you are trying to solve? for many IO-based use cases such as talking to external APIs, DBs and processing files, the default concurrent task runner + a for-loop may be all you need to run things fast enough without the overhead of Dask or Ray

Jan Domanski

05/08/2022, 2:06 PM

Perfect, this is super helpful, thank you

👍 1

Jan Domanski

05/08/2022, 2:06 PM

Yeah, it’s about doing an inexpensive map calculation on a huge list of objects (1e8) but we’re seeing the problem with even 1e2-1e3 objects, so we’ll consider chunking to improve the situation. The looping indeed works for us.

Jan Domanski

05/08/2022, 2:07 PM

Is the map and for loop approach equivalent? It should be right?

Jan Domanski

05/08/2022, 2:08 PM

What’s the recommended way to aggregate multiple results in a subsequent task? wait_for?

Anna Geller

05/08/2022, 2:10 PM

roughly yes, mapping has some nuances but it should have the same effect and yes, to retrieve the results, you'd need to do

.result()

Jan Domanski

05/08/2022, 2:15 PM

Okay, thank you

👍 1

5 Views

Open in Slack

Previous Next