Thread
#prefect-community
    Pedro Machado

    Pedro Machado

    10 months ago
    Hi there. I am working on a flow that needs to query an API endpoint 60k times. The API is slow so I need to use concurrency. I have a python class that uses a
    requests
    session to make the API requests. This class also implements rate limiting. I'd like to confirm that if I use the LocalDaskExecutorwith threads I can pass a single instance of the class to a mapped task and it effectively rate limit across all mapped tasks. Also, is there a benefit to using a resource manager task to instantiate the class that queries the api?
    Anna Geller

    Anna Geller

    10 months ago
    You should consider Prefect tasks stateless. Therefore, afaik a stateful rate limiting logic (as a class) cannot be shared across the tasks, unless you use some external data store (say Redis) to store the rate limiting data so that each mapped task could look it up individually before making a request. The resource manager wouldn’t help here neither because the connection or class that you would use the resource manager task for, cannot be shared across mapped tasks, too.
    @Pedro Machado did you consider the KV Store or prefect context to temporarily store stateful information for your use case?
    Pedro Machado

    Pedro Machado

    10 months ago
    Hi Anna. Thanks for your input. I have not used the KV store yet. Is this meant to be used at a high frequency like Redis? My use case does not involve a huge number of calls, but it it would be in that 60k per flow run/week range. I'd need to think about a good way to use a KV structure to rate limit across multiple tasks. I've seen some algorithms based on Redis but those use other data structures (not the plain KV data structure). Let me know if you have any ideas.
    Anna Geller

    Anna Geller

    10 months ago
    atm I don’t 😄 KV Store is indeed more for simple lookups, not for massive parallel writes. But Redis is definitely something I would think of when doing this type of task. You can then lookup either number of requests sent during a specific time interval, or store specific rate limiting information from the API before making any new request.
    Pedro Machado

    Pedro Machado

    10 months ago
    I think I'll start simple and add more complexity as needed. I'll probably just handle the 429 rate limit responses by waiting and empirically test how mapped tasks I can have before I hit the limit. Thanks again!