Hey everyone!
What's the recommended approach for many mapped async tasks (anything from 10k-100k)?
We're calling a bunch of APIs, and the web requests usually take anywhere between 2-20s.
I've tried create batches of these tasks and then calling
.map
for all the tasks in a batch.
While this works, it feels very hacky and isn't ideal for performance.
Any help is appreciated
z
Zanie
01/25/2023, 4:26 PM
Hey! I’m working on some improvements in this area. What kind of problems are you running into? What kind of HTTP client are you using?
k
Kelvin DeCosta
01/26/2023, 2:32 PM
We're using a wrapper around the
requests
module.
At the moment the main issue is with the time taken for each task.
Sometimes the task is over within 10-20s, other times it takes 2-5 mins.
z
Zanie
01/26/2023, 4:14 PM
If you’re writing async tasks, you should use an async client like httpx
Zanie
01/26/2023, 4:15 PM
Although it depends what your wrapper looks like, I’m guessing you’re sending requests in threads?
k
Kelvin DeCosta
01/27/2023, 6:11 AM
I don't think we're using threads, I'm not sure about it.
The wrapper essentially calls
requests.request
z
Zanie
01/27/2023, 3:44 PM
Oh, that’ll be bad for the event loop — it’ll scale very poorly that way.
k
Kelvin DeCosta
02/01/2023, 7:41 AM
Thanks for the heads up @Zanie
From what I can tell, when previously using
concurrent.futures.ThreadPoolExecutor
, our
requests
wrapper performed maybe 5-20 times faster than it does now via prefect tasks.
I'm considering migrating from
requests
to
httpx
.
I'd really appreciate it if you could give a rough estimate of the expected performance increase.
z
Zanie
02/01/2023, 3:55 PM
I can’t give an estimate — but performing synchronous IO in an asynchronous context is very bad for async performance.
Zanie
02/01/2023, 3:56 PM
Running each request in a Prefect task will always be slower though, because the task needs to be orchestrated which takes a few API calls so you’re increasing the number of total requests.
Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.