Hey folks, I'm trying to do following but can't figure it out. Any kind of pointers will be appreciated.
I want to build a pipeline where it loads a file larger than RAM my local machine have. I can do that with Dask alone, but not sure how to do that in prefect task.
In other words, is it possible that the first task of a flow load a chunk of dataframe and once it processed, it will load next chunk and so on? Thanks.
k
Kyle McChesney
07/29/2021, 3:29 PM
I am still very new to this, but it seems like you want to use
Task.map
and be sure start your flow with the Dask executor.
👍 1
k
Kevin Kho
07/29/2021, 3:32 PM
Hey @Rutvik Patel, this is how you would work with Dask Dataframes in a Prefect task
Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.