Hey everyone! What's the recommended approach for...
# best-practices
k
Hey everyone! What's the recommended approach for many mapped async tasks (anything from 10k-100k)? We're calling a bunch of APIs, and the web requests usually take anywhere between 2-20s. I've tried create batches of these tasks and then calling
.map
for all the tasks in a batch. While this works, it feels very hacky and isn't ideal for performance. Any help is appreciated
z
Hey! I’m working on some improvements in this area. What kind of problems are you running into? What kind of HTTP client are you using?
k
We're using a wrapper around the
requests
module. At the moment the main issue is with the time taken for each task. Sometimes the task is over within 10-20s, other times it takes 2-5 mins.
z
If you’re writing async tasks, you should use an async client like httpx
Although it depends what your wrapper looks like, I’m guessing you’re sending requests in threads?
k
I don't think we're using threads, I'm not sure about it. The wrapper essentially calls
requests.request
z
Oh, that’ll be bad for the event loop — it’ll scale very poorly that way.
k
Thanks for the heads up @Zanie From what I can tell, when previously using
concurrent.futures.ThreadPoolExecutor
, our
requests
wrapper performed maybe 5-20 times faster than it does now via prefect tasks. I'm considering migrating from
requests
to
httpx
. I'd really appreciate it if you could give a rough estimate of the expected performance increase.
z
I can’t give an estimate — but performing synchronous IO in an asynchronous context is very bad for async performance.
Running each request in a Prefect task will always be slower though, because the task needs to be orchestrated which takes a few API calls so you’re increasing the number of total requests.
1
k
Thank you for the heads up!