https://prefect.io logo
Title
p

Pedro Machado

04/28/2020, 2:03 AM
Hi everyone. What is the recommended pattern to rate limit access to an API? Suppose you have a task that returns items and you want to map that output to a task that queries the API for each item. What is the recommended way to rate limit those calls to the API? Thanks!
👀 1
n

nicholas

04/28/2020, 2:10 AM
Hi @Pedro Machado, I think the easiest way to accomplish something like that would be to use a rate limiting library like https://pypi.org/project/ratelimit/, which has a really nifty function decorator 🙂
p

Pedro Machado

04/28/2020, 2:21 AM
I normally use something like this but I am trying to wrap my mind around the concept of these Prefect mapped tasks which I understand could be executed in parallel in separate processes. In that case the rate limit would not be applied across all tasks but within each process. I have an Airflow dag that has to call an API 20k times. The problem that I have now is that a single task makes these 25k API calls but if that task fails, I have to repeat all the calls. I was thinking that in Prefect this could be a mapped task. In that case the failure of a single API call would result in just that task failing and retrying but I am wondering about the rate limit issue. My code has retry logic that handles transient errors so this may not be the best example of a potential task failure, but in general I am wondering how you'd go about limiting concurrency and rate in a distributed environment without adding something like redis to the mix.
n

nicholas

04/28/2020, 2:47 AM
Ahh ok my apologies! We have a Cloud Platform feature that works something like that called Task Concurrency limiting; this feature works by "tagging" a task and providing a global concurrency limit. Prefect then ensures that no more than the limit of any given task with that tag is run at any given time. However I'm not sure that exactly fits your use case, which if I understand correctly is retrying mapped tasks that fail without having to retry every task in the mapped list. In that case, you can set the max retries and retry delay directly and each failed mapped task will retry on its own (without impacting the other tasks in the set). More details on that here. Let me know if that's a little closer to what you were thinking!
p

Pedro Machado

04/28/2020, 3:23 AM
This gets me closer. I really like the Perfect pattern of mapped tasks. I'd just have to figure out a way to rate limit across parallel tasks.
Thanks!
n

nicholas

04/28/2020, 3:32 AM
Happy to help! 😄