Hi everyone, I have a question regarding CSV stre...
# ask-community
f
Hi everyone, I have a question regarding CSV streaming: Is it possible to do this with Prefect? I have a 24TB file which I want to cut in batches and for each batch do two tasks (some transforming, then writing). The first task would be to stream the csv file (https://www.heatonresearch.com/content/csv_file.html). The main thing is that I do not want to load everything in RAM!
j
Hey! I’m interested to see if someone know in the community knows the best prefect way. But in theory, you could download the file in batches of 100MB and process it. Alternatively, I would suggest checking out dataflow. Product designed by Google for batch/stream processing. It would be capable of downloading the file, and splitting the work automatically over multiple workers.
🙌 1
k
I think you’ll likely run into memory issues with Prefect, I think you should use Dask itself to process that and have that file partitions beforehand. Dataflow above looks like a good suggestion.