David Haynes

    David Haynes

    4 months ago
    Fun with dataframes and map. So I have a map function that loads a dataframe in one task and then uses that dataframe in the next task. When I look at the return type on the first task, it is pandas.core.frame.DataFrame. When I look at the type of the dataframe value in the second task, it is class 'list'. I have the type hint set to pandas.DataFrame in the second task. Does anyone have any idea why the DataFrame is becoming a list?
    Kevin Kho

    Kevin Kho

    4 months ago
    The hint is not read and applied. It’s just a hint. Is your second task a map too?
    David Haynes

    David Haynes

    4 months ago
    The second task isn't a map. Teh flow is set of file to map, read file into datafile, pass datafile to next task.
    Kevin Kho

    Kevin Kho

    4 months ago
    If task one is mapped, and task two is not mapped, it will return a list of the mapped items to reduce it
    @task
    def abc(x):
        return x+1
    
    @task
    def bcd(list_x):
        sum(list_x)
    
    with Flow(...) as flow:
        a = abc.map([1,2,3,4,5])
        b = bcd(a)
    a will be of type list here because it was the output of map. and
    bcd
    will take in a list because it is a
    reduce
    step
    David Haynes

    David Haynes

    4 months ago
    OK Thanks. I think I will have to combine the subsequent tasks into one to use the map.
    This is all an attempt to add concurrency to the tasks in a flow. Maybe I need to look at flow of flows with Parameters instead?
    QQ: If in your example, b = bdc.map(a) would the map essentially iterate over a? i.e. in my case, would doing an additional map allow me to access the dataframes as elements of a map list?
    Never mind. I discovered that I was correct. Thanks for your help.