Jesse Powell
05/21/2024, 12:36 AMDev Dabke
05/23/2024, 7:04 PM.submit
when invoking a task
• I never access a DataFrame
in a flow, only in a task
• I generally compute statistics with stats_df = df.describe().reset_index()
and I upload that as an artifact stats_df.astype(str).to_dict("records")
(note the cast to str
silences issues in handling non-primitive types) in a task
Sticking to those rules has resulted in a very ergonomic way of manipulating DataFrame
objects in prefect. Also, one other cool thing: it's very easy to manage caching, which is helpful for long-lived flows that have failure points.