What’s the standard approach to running thousands ...
# prefect-getting-started
c
What’s the standard approach to running thousands of tasks without crashing due to rate limits? I see that the rate limit is 400 - 2000 per minute. Are you just supposed to throttle your jobs down to this rate? I saw that Prefect has a global concurrency limit feature but it seems like a weird workaround. To use this effectively I think I’d need to write wrappers around all Prefect API functions to respect the Prefect rate limits. Has someone already done this? Is there a mode I can switch into where the Prefect client throttles itself on all API requests, perhaps respecting a Retry-After header on the 429s? Please let me know if there’s a better place to direct these questions
👀 1
k
Hey Cole! you're definitely asking in the right place. I've got some suggestions for your first question, and I'll get them written out as soon as I have a moment!
c
Thanks so much!
The problem seems to go away when self-hosting with
prefect server start
. This unblocks me for now
e
@Cole Erickson The Prefect client does indeed respect Retry-After and will retry all requests up to a configurable number of times (default 5).
There is no concept of rate limiting in self-hosted… you have to ensure you run the service at scale to handle the requests, otherwise it may fail over.
c
Thanks Emil, good to see that there are retries. It seems the server isn’t responding with a conservative enough Retry-After delay because the workflows are failing with the default Prefect settings
e
@Cole Erickson with enough volume of requests, client retries won’t solve the problem. Think of them as there to buffer out any smaller spikes or any transient issues. The retry-after sent by the server reflects when the next requests would be allowed… that bandwidth could get rapidly consumed depending on how many requests are trying to get through.
👍 1
I definitely see your point, but there’s a tradeoff here on how long you’re willing to wait on a request. For some users, waiting substantially isn’t desirable, though that might be something nice to consider.
I would recommend setting a high value for
PREFECT_CLIENT_MAX_RETRIES
. I guess another opportunity could be introducing a way to wait for longer than the minimum retry period.
👀 1
c
I see what you’re saying. Thanks. I think I have three workarounds to pursue 1. Increase retries as you suggest 2. Application-level code to throttle my
submit
calls 3. Self-host
e
What kind of volume are you planning for? What’s the order of magnitude of flows/tasks and what’s the pattern (bursty, sustained, or something else)?
k
Just throwing this out there, you could combine the methods explained above with a somewhat more strict approach of concurrency limits on the work pool or work queue level. If you have a decent estimate of orchestration or logging API calls in your flows, you've at least got a mechanism for constraining when retries will be needed.
👀 1
c
That’s an interesting idea, thanks
@Emil Christensen It’s an offline batch job that runs 10k+ tasks per job. Today, we usually run them with 300-1000 workers in GCP Dataflow. First we run one job to download and preprocess satellite images. Each worker node can process only one task at a time due to memory requirements. After the download job is done, we run another Dataflow job with GPUs attached to run an ML model on each image. Naively written, it’s extremely bursty because we just want to run two `task.map`s with a big list of image IDs. However, the speed of the job isn’t critical, so it’s fine to throttle it. We could also reframe it as a streaming computation instead of batch, but I don’t think I’ll need to
like
Copy code
image_ids = [ (10,000+ elements) ]
preprocessed_images = preprocess.map(image_ids)
ml_images = run_ml_model.map(preprocessed_images)
👀 1
e
Ah I see… and I’m guessing the processing of each image takes a while? As in… the problem is upfront submission of tasks, not necessarily the required requests over the whole lifespan of the job.
👍 1
c
Yeah, the image processing takes 10 seconds to 10 minutes
I think if I just write a
slow_map
with some sleeping in there it could take care of this in an acceptable way
1
Signing off for today. Thanks for all the help - have a nice weekend!
e
Sounds like a plan! Happy to discuss this more next week. Have a great weekend