Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.

Prefect Community

Is there any real reason to manage dataframe based ml pipelines with something like kedro or Hamilton? I've just been using prefect tasks

I personally use prefect tasks. A few usage notes for me:
• I always use `.submit` when invoking a task
• I never access a `DataFrame` in a flow, only in a task
• I generally compute statistics with `stats_df = df.describe().reset_index()` and I upload that as an artifact `stats_df.astype(str).to_dict("records")` (note the cast to `str` silences issues in handling non-primitive types) in a task
Sticking to those rules has resulted in a very ergonomic way of manipulating `DataFrame` objects in prefect. Also, one other cool thing: it's very easy to manage caching, which is helpful for long-lived flows that have failure points.