What s the recommended way to pass pandas DFs between tasks Prefect Community #ask-community

Join Slack

What's the recommended way to pass pandas DFs betw...

# ask-community

Ching

09/22/2022, 3:44 PM

What's the recommended way to pass pandas DFs between tasks? With big DFs (1GB+) serialization is taking a lot of time.

👀 1

✅ 1

Andreas Nord

09/23/2022, 6:17 AM

Wondering the same

✅ 1

Joao Moniz

09/23/2022, 7:20 AM

Kind of new to Prefect, but from my experience working with orchestration tools (like Prefect, Airflow), they should be used more to orchestrate and less to transform data (specially when data starts to get big). You could leverage Prefect + Spark for example as an alternative. For your use-case specifically, what you can also try to do it to persist the data in an intermediate storage layer (like s3, gcs) using parquet, and instead of passing the whole dataframe between tasks you can pass the file path to the new task

Andreas Nord

09/23/2022, 7:38 AM

@Joao Moniz What would you recommend for the case where the data can clearly fit in memory, but prefect are starting to give some problems? Spark feels like a unnecessary complication that will probably make the code run slower

Joao Moniz

09/23/2022, 7:48 AM

Hi Andreas. Not sure, if is something related to Prefect might be better to start a new thread with the logs, someone from their team might be able to provide some technical help. I was talking more like a general architecture, but you are right, if the data is small (fits in memory) it doesn't makes sense to add Spark complexity to it.

Ching

09/23/2022, 7:52 AM

I am going to try caching data via parquet and not rely on prefect

Ching

09/23/2022, 7:52 AM

My DFs do fit in memory (big servers), but moving data between workers in Dask might be my bottle neck

👍 1

Jeff Hale

09/23/2022, 1:59 PM

This tracking PR on Results encompasses a good number of PRs to improve results. The improvements there might be helpful. @Zanie might have more insight.

Ching

09/27/2022, 4:10 PM

Nice, will follow. Thanks!

👍 1

3 Views

Open in Slack

Previous Next