Hi!
I recently experienced memory issues with running prefect + pandas.
A subflow gets stuck in running and I received the crash notification for the top flow:
Copy code
State message: Flow run infrastructure exited with non-zero status code -9.
All my tasks of the subflow that gets stuck in running succeeds, but it somehow runs out of memory before succeeding.
It seems that the process of determining flow final state takes up a lot of memory when many tasks return pandas dataframes.
Any suggestions?
n
Nelson Griffiths
09/07/2023, 1:11 PM
@Andreas Nord I would recommend considering a switch from pandas to polars. We run all our flows on Google cloud run and switched to polars. It runs much faster and keeps the memory a lot lower than our old flows. https://www.pola.rs/
Nelson Griffiths
09/07/2023, 1:11 PM
In my experience it has worked nicely with our prefect setup for data pipelines
a
Andreas Nord
09/07/2023, 1:44 PM
I'm aware of this library but it's quite a big project to port all the code, need to a short-medium term solution
Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.