Leanna Morinishi
12/14/2021, 5:39 PMquery_list is a list of str, each of which is a valid query. Let’s say there’s 3 queries and each query gives me back 10 rows. This task works fine.
data = execute_test_query.map(query=query_list)
Now I want to transform a concatenated dataframe in its entirety.
@task
def n_rows(df):
    rows = df.shape[0]
    print(f"There are {rows} rows!")
    return rows
I was expecting data_2 = n_rows.map(flatten(data))  to give "There are 30 rows!", but I get key errors. Any idea what I need to do to flatten data?John Shearer
12/14/2021, 5:59 PMJohn Shearer
12/14/2021, 6:05 PM@task
def some_task(value: int) -> int:
    return value + 42
@task
def some_reduce_task(value_list: list[int]) -> int:
    # intentionally very obvious reduce
    sum = 0
    for value in value_list:
        sum = sum + value 
    return sum
with Flow() as flow:
    some_list = [1,2,3,4,5,42]
    some_task_result_list = some_task.map(some_list) # will do some_task each element of list separately
    reduced_result = some_reduce_task(some_task_result_list )John Shearer
12/14/2021, 6:07 PM@task
def my_reduce_task(df_list: list[pd.DataFrame]) -> pd.DataFrame:
    df = pd.concat(df_list)
    return dfKevin Kho
List[pd.DataFrame] and then use pd.concat to combine your list of dataframes before calling shape. You should not need to use flatten.
You can also just return the row count of each one. Reduce them in a function and return the total. I would suggest thisLeanna Morinishi
12/14/2021, 6:10 PM