Hi, another newb question: What is the reasonable ...
# ask-community
l
Hi, another newb question: What is the reasonable usage of
map()
? is using
map()
to handle a list of thousands (or millions?) of elements one by one reasonable? Should I micro-batch it into chunks? Does prefect start to choke if you have many many tasks in a flow? What about the Dashboard when you examine such a flow after it has run?
👀 1
n
Hi @Luis Muniz - not a newb question at all.
.map
is great for handling lists in the thousands but I think as you go beyond that (or even at that) batching becomes really valuable. In particular as you scale up, you'll want to take a look at parallelization and depth-first execution on Dask; you'll see really improved performance with the latter in particular.
l
ok, thanks I really appreciate it. The tutorial example scraping movie scripts seemed to indicate that this is a viable pattern but somehow my "unicorns don't exist" radar was flashing pink
n
To my knowledge so far, we've had users with mapped tasks in the hundreds of thousands without issue; if you find you're having trouble you can definitely report that to us and we'll be happy to help 🙂
l
I have to say that these advanced techniques you mention are unknown to me, I would appreciate some pointers that would help me learn about them
n
Definitely, one sec and let me find some links (meant to include those in the initial message, mb)
👍 1
l
Thanks a million. What a great community!
🚀 4