Hi there I am working on a flow that needs to query an API e Prefect Community #ask-community

Hi there. I am working on a flow that needs to que...

Pedro Machado

11/24/2021, 8:33 PM

Hi there. I am working on a flow that needs to query an API endpoint 60k times. The API is slow so I need to use concurrency. I have a python class that uses a

requests

session to make the API requests. This class also implements rate limiting. I'd like to confirm that if I use the `LocalDaskExecutor`with threads I can pass a single instance of the class to a mapped task and it effectively rate limit across all mapped tasks. Also, is there a benefit to using a resource manager task to instantiate the class that queries the api?

Anna Geller

11/24/2021, 8:43 PM

You should consider Prefect tasks stateless. Therefore, afaik a stateful rate limiting logic (as a class) cannot be shared across the tasks, unless you use some external data store (say Redis) to store the rate limiting data so that each mapped task could look it up individually before making a request. The resource manager wouldn’t help here neither because the connection or class that you would use the resource manager task for, cannot be shared across mapped tasks, too.

Anna Geller

11/24/2021, 9:19 PM

@Pedro Machado did you consider the KV Store or prefect context to temporarily store stateful information for your use case?

Pedro Machado

11/24/2021, 9:23 PM

Hi Anna. Thanks for your input. I have not used the KV store yet. Is this meant to be used at a high frequency like Redis? My use case does not involve a huge number of calls, but it it would be in that 60k per flow run/week range. I'd need to think about a good way to use a KV structure to rate limit across multiple tasks. I've seen some algorithms based on Redis but those use other data structures (not the plain KV data structure). Let me know if you have any ideas.

Anna Geller

11/24/2021, 9:30 PM

atm I don’t 😄 KV Store is indeed more for simple lookups, not for massive parallel writes. But Redis is definitely something I would think of when doing this type of task. You can then lookup either number of requests sent during a specific time interval, or store specific rate limiting information from the API before making any new request.

Pedro Machado

11/24/2021, 10:05 PM

I think I'll start simple and add more complexity as needed. I'll probably just handle the 429 rate limit responses by waiting and empirically test how mapped tasks I can have before I hit the limit. Thanks again!

🙌 1

9 Views

Open in Slack

Previous Next