02/23/2021, 2:41 AM
Hi everyone. Very new to Prefect. I have some large files that I want to put through a pipeline, and after setting up the db tables, etc. I only want to process one row at a time so that I can validate errors and skip dodgy rows, etc. Speed isn’t a huge concern. Does anyone know of a good strategy to achieve this with Prefect?

Chris White

02/23/2021, 4:11 AM
Hi Carl — it sounds like you might benefit from Prefect mapping (; in particular you could have one task that returns row IDs that a downstream task maps over to process. When using mapping (especially with large numbers of mapped tasks), I highly recommend a Dask Executor for parallelism (
upvote 2


02/23/2021, 9:45 AM
Thanks for the hint. I think this comes pretty close to what I need. Now my question is, being used to pandas/dataframes, how might I persist the column names in each iteration? i.e. is there a way to return a
or something similar? Or perhaps an alternative way to make referencing the data in a single row more readable?
Copy code
import pandas as pd
from prefect import task, Flow, Parameter

def extract_file(filename):
    df = pd.read_csv(filename)
    return df.values

def transform(df_row):
    #do something
    return df_row

def load(df):

def build_flow():
    with Flow('Test ETL') as flow:
        df = extract_file(filename)
        df_row =
        r = load(df_row)
    return flow

flow = build_flow(){'filename': 'data/somefile.csv'})