I'm downloading ~20K images with asyncio. I have a...
# prefect-community
I'm downloading ~20K images with asyncio. I have a flow called download_images which calls a function (not a task) to download each image. I've tried to add a task decorator (I'd like to use caching) to the download function, but I have the feeling that it significantly slows down the process of downloading all images. Is it correct that adding a task decorator adds overhead (especially on this scale)?
There would be more overhead than just using a python function. I can't say for sure how much it would slow it down, but you could try testing between a small sample of images- and see if there is any significant difference
For sure, each task would be API calls as well across the network. In 1 and 2's this might be minimal, but at 20k scale, it can certainly add up. How slow is it? Do you have performance examples with / without the task decorator for time taken? How much time would it take to re-download everything if it fails, versus, if state is cached for each download and not re-downloaded? This might take some trial and error to determine optimal performance
Thanks both of your input! I'm going to do some performance tests next week, will report the results 🙂