Hey team quick one about mapping if I have some code like so Prefect Community #ask-community

Hey team, quick one about mapping. if I have som...

Ben Muller

10/19/2021, 12:44 AM

Hey team, quick one about mapping. if I have some code like so:

Copy code

with Flow(
    name="horse_racing_data",
) as flow:
    dates = get_dates_task(days_back=days_back, days_ahead=days_ahead, dt_format="%d-%b-%Y")

    raw_sectional_data = apply_map(get_puntingform_sectional_dump_task, date=dates)


    spell_stats_data = apply_map(
        query_db_for_df_task, path_to_sql=unmapped("sql/select_spell_count.sql")
    )
    
    enriched_pf_data = apply_map(
        calculate_runners_spell_stats_task, pf_sectional_df=raw_sectional_data, spell_data=spell_stats_data
    )

I am making multiple separate

apply_map

calls and I just wanted to make sure if I can guarantee that when calling

calculate_runners_spell_stats_task

I can guarantee the order of the returned maps? What i mean is that

raw_sectional_data

and

spell_stats_data

are iterables and as they are provided to the function it is important that they maintain the same order. Am I all good here?

Kevin Kho

10/19/2021, 1:27 AM

Why are you using

apply_map

over map? Seems like you only have one task per apply map? I think that order is guaranteed though. Order for mapped task is guaranteed and I went over the code and all it does is unpack the

apply_map

and make a flow for it. Just note that the lengths should be equal per these comments

Ben Muller

10/19/2021, 1:30 AM

haha I thought apply_map is map, what is the difference ? I think someone pointed me to apply_map on this channel before

Ben Muller

10/19/2021, 1:31 AM

and yep, lengths are the same

Kevin Kho

10/19/2021, 1:32 AM

Apply map is for a complex sequence of tasks. In the example, the

inc_or_negate

is not a task. It’s a function that uses tasks.

Ben Muller

10/19/2021, 1:33 AM

ah, right, so are there performance benefits in changing to map?

Kevin Kho

10/19/2021, 1:34 AM

Simplifies the code I think. I think performance difference will be minimal, but yes it is redundant.

Kevin Kho

10/19/2021, 1:35 AM

I think you want something like the iterated mapping example

Ben Muller

10/19/2021, 1:35 AM

hmm, kind of

Ben Muller

10/19/2021, 1:39 AM

I just tested it out and the order is definitely not maintainted

Ben Muller

10/19/2021, 1:54 AM

the order appears to depend on the time taken for each task to complete

Ben Muller

10/19/2021, 1:54 AM

might need to implement an intermediate task that combines the outputs and reorders them

Ben Muller

10/19/2021, 1:57 AM

in saying that, when the mapped outputs are dataframes, this becomes quite complicated. Surely I am not the only person to run into this issue

Kevin Kho

10/19/2021, 2:02 AM

For apply_map or map? Could you show me the code snippet?

Ben Muller

10/19/2021, 2:02 AM

give me a second, I might have missed the point of apply_map and it actually solves my issue

Kevin Kho

10/19/2021, 2:03 AM

Cuz consecutive mapped tasks are compressed and executed together, which is what guarantees the order (on Dask)

Open in Slack

Previous Next