I'm downloading ~20K images with asyncio. I have a flow called download_images which calls a function (not a task) to download each image. I've tried to add a task decorator (I'd like to use caching) to the download function, but I have the feeling that it significantly slows down the process of downloading all images. Is it correct that adding a task decorator adds overhead (especially on this scale)?
j
James Sopkin
02/03/2023, 3:45 PM
There would be more overhead than just using a python function. I can't say for sure how much it would slow it down, but you could try testing between a small sample of images- and see if there is any significant difference
c
Christopher Boyd
02/03/2023, 4:04 PM
For sure, each task would be API calls as well across the network. In 1 and 2's this might be minimal, but at 20k scale, it can certainly add up.
How slow is it?
Do you have performance examples with / without the task decorator for time taken?
How much time would it take to re-download everything if it fails, versus, if state is cached for each download and not re-downloaded?
This might take some trial and error to determine optimal performance
n
Nils
02/03/2023, 4:06 PM
Thanks both of your input! I'm going to do some performance tests next week, will report the results 🙂
Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.