Abhas P
09/02/2021, 6:21 PM@task
def transform(df):
return transformed_df
with Flow("transform-1") as flow:
transform(df). # current implementation
for each df in filtered_dfs: # suggested way to utilize paralellization optimization on tasks
transform(df)
1. Will breaking down the data frame into small chunks and the calling the transformation task on each batch of the dataframe help me reap the benefits of prefect parellelization using a dask executor?
2. Given that the input to a transform task is a large dataframe - what other steps can I consider to optimize the turn around time of the flow ? (suggest using different run configs and prefect paradigms of writing the code)Dustin Ngo
09/02/2021, 8:40 PMPrefect
map on an Iterable[pd.Dataframe]
to naturally submit parallel tasks to a Dask, and you can use this to naturally contruct Dask-friendly flows. You can find examples in our documentation: https://docs.prefect.io/core/concepts/mapping.htmlAbhas P
09/02/2021, 8:51 PMAbhas P
09/03/2021, 5:18 PM