We are getting some workflow runs failing immediat...
# ask-community
l
We are getting some workflow runs failing immediately on Prefect Cloud with just one log message:
Copy code
March 17th 2020 at 2:32:51am | agent
ERROR 
An error occurred (ThrottlingException) when calling the RunTask operation (reached max retries: 4): Rate exceeded.
z
Hi Luke, sorry you're running into this. Taking a look now.
l
ok, Thanks!
z
Quick follow-up question: is your flow doing anything with AWS?
l
other runs of the same flow (with different parameter values) are failing with a different single log message:
Copy code
March 17th 2020 at 2:32:59am | agent
ERROR 
list index out of range
accessing S3
z
And what does your flow execution setup look like? This looks like something on the AWS end, but trying to narrow down where it could be happening. Happy to take this to DM if you'd rather not share specifics of how you run your work.
l
so my colleague kicked off 365 parameterized runs of the same workflow in a for loop 🙂
😂 2
z
Welp, that could definitely trigger some AWS throttling.
Happy to help with any other questions you have, but that sounds like the culprit.
l
the first task isn't accessing S3 though...
Is the agent being throttled from too many ECR requests?
z
Without knowing more about how you're setup, my guess would be that something in the ECR/ECS or EKS API is what's throwing that.
l
yeah it's a FargateAgent
z
Because it sounds like you're seeing this error message before the flow even has a chance to get submitted.
l
yeah
Ok, i think we'll have to do some negative engineering 😉
maybe sleep between
client.create_flow_run
calls to space them out.
z
That could do the trick in the short term. I also know that when I was in AWS consulting, we could contact support to get certain limits raised. YMMV there.
l
Would it make sense for me to open an issue / feature request to implement retries in the FargateAgent to handle throttling errors from AWS?
z
If nothing else, it's definitely worth discussion! Go for it.
👍 1
l