Occasionally I am getting crashes for one of my Fargate depl Prefect Community #ask-community

Occasionally I am getting crashes for one of my Fa...

James Gatter

03/19/2025, 6:32 PM

Occasionally I am getting crashes for one of my Fargate deployments when I schedule many to run:

Copy code

Flow run could not be submitted to infrastructure: An error occurred (ClientException) when calling the RegisterTaskDefinition operation: Too many concurrent attempts to create a new revision of the specified family.

I try and schedule my jobs at least a few seconds apart (perhaps this is not enough). I also am using concurrency limits to enqueue jobs, it may be possible many late jobs awaiting a concurrency slot are trying to enter at once? Not sure. Any help or advice in avoiding these crashes would be greatly appreciated!

Kevin Grismore

03/19/2025, 6:51 PM

for the task def registration request, the max capacity of tokens for a given second is 20, but refills at only 1 per second, so it could be that your bucket of request tokens is always nearly empty

Kevin Grismore

03/19/2025, 6:52 PM

I'm curious whether every run is registering a task def, or only under certain circumstances, like when a deployment config changes. it should be the case that rerunning the same deployment after making no changes shouldn't result in the registration of a task definition

Kevin Grismore

03/19/2025, 6:53 PM

there are some strategies for handling this, but they all revolve around minimizing the quantity of registrations by trying to keep config stable or discoverable

James Gatter

03/19/2025, 6:54 PM

Yeah same here, I wouldn't have expected a new task definition to be registered by runs of the same deployment. I'm only changing the parameters between runs.

James Gatter

03/19/2025, 6:57 PM

for the task def registration request, the max capacity of tokens for a given second is 20, but refills at only 1 per second, so it could be that your bucket of request tokens is always nearly empty

Helpful to know... perhaps I'll have to schedule things out a little further apart or maybe decrease my concurrency limit from 20

3 Views

Open in Slack

Previous Next