Hey Community, As we have been scaling out our pr...
# ask-community
b
Hey Community, As we have been scaling out our prefect usage, we are starting to be quite hampered by a few limitations with how prefect might be handling ECS and its fargate task registration. I keep running into this error:
Copy code
an error occurred (ClientException) when calling the RegisterTaskDefinition operation: Too many concurrent attempts to create a new revision of the specified family
I have followed the instructions here, but have had no change in the behaviour. We run about 2500 flows every 24 hours and they are generally short running ~60 seconds and run every 15 or so minutes. I use the aws cdk and this is how I have injected the variables into my fargate ecs agent.
Copy code
fargate_task_definition.add_container(
            id="prefect-fargate-agent-container",
            container_name="prefect-fargate-agent-container",
            image=ecs.ContainerImage.from_ecr_repository(
                prefect_agent_image_repo, "latest"
            ),
            cpu=256,
            memory_limit_mib=256,
            logging=ecs.AwsLogDriver(stream_prefix="prefect-fargate-agent-container"),
            environment={
                "AWS_DEFAULT_REGION": "ap-southeast-2",
                "LAUNCH_TYPE": "FARGATE",
                "AWS_RETRY_MODE": "adaptive",
                "AWS_MAX_ATTEMPTS": "100",
                "TASK_ROLE_ARN": task_role.role_arn,
                "LABEL": "prefect-agent-fargate",
                "NAME": "prefect-fargate-agent",
            },
            secrets={
                "PREFECT__CLOUD__AGENT__AUTH_TOKEN": ecs.Secret.from_ssm_parameter(
                    prefect_cloud_agent_auth_token
                )
            },
        )
Is there anyone who might be able to help me fix this issue, its really hurting us at the moment. Cheers
z
Hey @Ben Muller did you also get a chance to try running an ECS agent using the changes here? https://github.com/PrefectHQ/prefect/pull/4380/files
b
I didnt do that, so do I just add
--task-definition-add-uuid true
?
z
Ah I’m sorry I lost track of that PR I thought we had merged it. Testing the change would actually involve (1) making the code changes from that pr and (2) building a custom Prefect version using the modified code and (3) setting the flag you mentioned
The final fix is being worked though in this pr, which you may have already found https://github.com/PrefectHQ/prefect/issues/4402
b
any idea on a time frame on this? It would be so useful for us. we are battling to develop any more flows because of this
z
I’ll check with the team today
🚀 1
Hey @Ben Muller I checked with the team - we're prioritizing a fix for this and hope to get one out in the next few weeks. I'll do my best to keep you updated, in the meantime I'd recommend following the relevant github issues/prs for relevant updates. Thanks for bringing this issue to our attention.
b
Thanks for the update @Zach Angell have a good weekend
z
you too!
b
Hey, just a heads up, I just realised I was only getting that ecs error for one particular flow. It happened because I had multiple clocks for the same flow and a couple of them were on the same schedule. So I think ecs didn't like registering two tasks from the same family at the same time? Anyway, I think that should probably be an easier fix for you guys
I ended up changing the schedules so that they aren't exactly the same and am no longer getting the errors. Not ideal, but a suitable workaround for now.
@Kevin Kho
Ignore everything I've said 😂. Errors are still occurring 🤷‍♂️
😂 2
👍 2
r
Hi @Ben Muller Did you manage to solve the issue? did you try the new Prefect version which solves this bug? or did you had some workaround? Asking because we encounter the same issue.
b
Hi @Roey Brecher, I believe the team have pushed an update that handles this, but I was naughty and combined one flow to map over several instead of having them all as sperate flows. I haven't had time to switch it back and tell if it's fixed.