Hello there 👋,
I'm a new user of Prefect, coming from the Airflow world when I happily used it for 3+ years. 🎂
I want to begin by saying that the project, the quality of the code and the quality of the documentation are outstanding 🤩
But I need some help about finding the good way to use it and the good practice 🤓
I have a standard ELT flow with a big json file (1GB) as input.
For my task to run successfully on my medium machine, I combine ijson and iterator to read and write the file on disk chunk by chunk and not overload the memory (I can't stuck a 1GB json dict in memory)
Then I load the file directly into my DB, without passing via python.
What is the prefect-way of handling a similar usecase here? 🤔
Prefect encourage passing data from task to task in-memory but here, I offload it to disk and only pass the path of the file between task.
Is there a way to pass an iterator between task instead of a single object?
One way I'm thinking of doing it in a industrialized way is maybe to share a file cache between tasks.
What do you guys think about it?