Has anybody used Arrow for anything with Prefect/D...
# ask-community
w
Has anybody used Arrow for anything with Prefect/Dask? https://arrow.apache.org/docs/python/index.html It seems appealing, I’m just wondering if there are sharp edges.
k
Hey @Wilson Bilkovich, how would you use it? I’m pretty curious too.
w
It seems to come up in conversations about alternatives when you’re outgrowing NumPy arrays, and want some kind of distributed version of same with a compatible-ish API
Which I suppose puts it in competition with Dask’s features in that area
I guess in my use case it would start life as something like https://arrow.apache.org/docs/python/numpy.html#numpy-to-arrow
We’ve got a ton of source data in MySQL, and we need to transform and load it in potentially several different back-ends.. pretty standard story other than the shape of the data.
In that middle transform part, we’re free to do anything we want computationally, so Arrow could be interesting. I hate adding things to the stack but sometimes tools are useful.
k
Yeah it feels like adding more tools to the stack. Would you use Prefect’s
map
operation? I have seen used just have a stack of
pd.DataFrames
that they map over to take advantage of the Dask engine. I think the main thing here is how seamless it is to work with Arrow types on the Dask engine?
w
Yeah. We’d like to have arrays larger than the memory of any single worker node, if it isn’t too devastating to port to
But Dask arrays can do that already to my understanding, so why not just use that etc
k
You can use Dask with Prefect like this also
w
Aha, that makes sense.