Ben Muller

    Ben Muller

    11 months ago
    Hey team, quick one about mapping. if I have some code like so:
    with Flow(
        name="horse_racing_data",
    ) as flow:
        dates = get_dates_task(days_back=days_back, days_ahead=days_ahead, dt_format="%d-%b-%Y")
    
        raw_sectional_data = apply_map(get_puntingform_sectional_dump_task, date=dates)
    
    
        spell_stats_data = apply_map(
            query_db_for_df_task, path_to_sql=unmapped("sql/select_spell_count.sql")
        )
        
        enriched_pf_data = apply_map(
            calculate_runners_spell_stats_task, pf_sectional_df=raw_sectional_data, spell_data=spell_stats_data
        )
    I am making multiple separate
    apply_map
    calls and I just wanted to make sure if I can guarantee that when calling
    calculate_runners_spell_stats_task
    I can guarantee the order of the returned maps? What i mean is that
    raw_sectional_data
    and
    spell_stats_data
    are iterables and as they are provided to the function it is important that they maintain the same order. Am I all good here?
    Kevin Kho

    Kevin Kho

    11 months ago
    Why are you using
    apply_map
    over map? Seems like you only have one task per apply map? I think that order is guaranteed though. Order for mapped task is guaranteed and I went over the code and all it does is unpack the
    apply_map
    and make a flow for it. Just note that the lengths should be equal per these comments
    Ben Muller

    Ben Muller

    11 months ago
    haha I thought apply_map is map, what is the difference ? I think someone pointed me to apply_map on this channel before
    and yep, lengths are the same
    Kevin Kho

    Kevin Kho

    11 months ago
    Apply map is for a complex sequence of tasks. In the example, the
    inc_or_negate
    is not a task. It’s a function that uses tasks.
    Ben Muller

    Ben Muller

    11 months ago
    ah, right, so are there performance benefits in changing to map?
    Kevin Kho

    Kevin Kho

    11 months ago
    Simplifies the code I think. I think performance difference will be minimal, but yes it is redundant.
    I think you want something like the iterated mapping example
    Ben Muller

    Ben Muller

    11 months ago
    hmm, kind of
    I just tested it out and the order is definitely not maintainted
    the order appears to depend on the time taken for each task to complete
    might need to implement an intermediate task that combines the outputs and reorders them
    in saying that, when the mapped outputs are dataframes, this becomes quite complicated. Surely I am not the only person to run into this issue
    Kevin Kho

    Kevin Kho

    11 months ago
    For apply_map or map? Could you show me the code snippet?
    Ben Muller

    Ben Muller

    11 months ago
    give me a second, I might have missed the point of apply_map and it actually solves my issue
    Kevin Kho

    Kevin Kho

    11 months ago
    Cuz consecutive mapped tasks are compressed and executed together, which is what guarantees the order (on Dask)