Hey team, quick one about mapping. if I have som...
# ask-community
b
Hey team, quick one about mapping. if I have some code like so:
Copy code
with Flow(
    name="horse_racing_data",
) as flow:
    dates = get_dates_task(days_back=days_back, days_ahead=days_ahead, dt_format="%d-%b-%Y")

    raw_sectional_data = apply_map(get_puntingform_sectional_dump_task, date=dates)


    spell_stats_data = apply_map(
        query_db_for_df_task, path_to_sql=unmapped("sql/select_spell_count.sql")
    )
    
    enriched_pf_data = apply_map(
        calculate_runners_spell_stats_task, pf_sectional_df=raw_sectional_data, spell_data=spell_stats_data
    )
I am making multiple separate
apply_map
calls and I just wanted to make sure if I can guarantee that when calling
calculate_runners_spell_stats_task
I can guarantee the order of the returned maps? What i mean is that
raw_sectional_data
and
spell_stats_data
are iterables and as they are provided to the function it is important that they maintain the same order. Am I all good here?
k
Why are you using
apply_map
over map? Seems like you only have one task per apply map? I think that order is guaranteed though. Order for mapped task is guaranteed and I went over the code and all it does is unpack the
apply_map
and make a flow for it. Just note that the lengths should be equal per these comments
b
haha I thought apply_map is map, what is the difference ? I think someone pointed me to apply_map on this channel before
and yep, lengths are the same
k
Apply map is for a complex sequence of tasks. In the example, the
inc_or_negate
is not a task. It’s a function that uses tasks.
b
ah, right, so are there performance benefits in changing to map?
k
Simplifies the code I think. I think performance difference will be minimal, but yes it is redundant.
I think you want something like the iterated mapping example
b
hmm, kind of
I just tested it out and the order is definitely not maintainted
the order appears to depend on the time taken for each task to complete
might need to implement an intermediate task that combines the outputs and reorders them
in saying that, when the mapped outputs are dataframes, this becomes quite complicated. Surely I am not the only person to run into this issue
k
For apply_map or map? Could you show me the code snippet?
b
give me a second, I might have missed the point of apply_map and it actually solves my issue
k
Cuz consecutive mapped tasks are compressed and executed together, which is what guarantees the order (on Dask)