Leanna Morinishi
12/14/2021, 5:39 PMquery_list
is a list of str
, each of which is a valid query. Let’s say there’s 3 queries and each query gives me back 10 rows. This task works fine.
data = execute_test_query.map(query=query_list)
Now I want to transform a concatenated dataframe in its entirety.
@task
def n_rows(df):
rows = df.shape[0]
print(f"There are {rows} rows!")
return rows
I was expecting data_2 = n_rows.map(flatten(data))
to give "There are 30 rows!"
, but I get key errors. Any idea what I need to do to flatten data
?John Shearer
12/14/2021, 5:59 PMJohn Shearer
12/14/2021, 6:05 PM@task
def some_task(value: int) -> int:
return value + 42
@task
def some_reduce_task(value_list: list[int]) -> int:
# intentionally very obvious reduce
sum = 0
for value in value_list:
sum = sum + value
return sum
with Flow() as flow:
some_list = [1,2,3,4,5,42]
some_task_result_list = some_task.map(some_list) # will do some_task each element of list separately
reduced_result = some_reduce_task(some_task_result_list )
John Shearer
12/14/2021, 6:07 PM@task
def my_reduce_task(df_list: list[pd.DataFrame]) -> pd.DataFrame:
df = pd.concat(df_list)
return df
Kevin Kho
List[pd.DataFrame]
and then use pd.concat
to combine your list of dataframes before calling shape. You should not need to use flatten
.
You can also just return the row count of each one. Reduce them in a function and return the total. I would suggest thisLeanna Morinishi
12/14/2021, 6:10 PM