Hi all! I've got this error on two of five flow ru...
# prefect-cloud
i
Hi all! I've got this error on two of five flow runs when running at once 5 instances of deployment using ECSTask Push work-pool:
Copy code
Flow run could not be submitted to infrastructure: An error occurred (ClientException) when calling the RegisterTaskDefinition operation: Too many concurrent attempts to create a new revision of the specified family.
When tried to re-launch the jobs, they've executed successfully. How to solve this issue? I know there are some issues with ECS Agent back in 2021: https://github.com/PrefectHQ/prefect/issues/4402 And ongoing issue with terraform: https://github.com/hashicorp/terraform-provider-aws/issues/9777 But I'm not sure how we can solve this on the ecstask push work-pool settings level
j
Hey! We are aware of this and working on a fix, I can let you know when it is live
šŸ™Œ 2
Hey apologies for the delay here, the fix for this should be live now! If you wouldn't mind giving it another shot and letting me know how it goes?
m
This issue is still present - can I specify or define a family in the prefect.yaml for this specific job to use so this issue doesn't occur
j
You can define a family on the work pool or at the deployment level (in prefect.yaml). If you don't pass one, one is generated for you using the deployment and work pool ids
m
I'll give it a shot. The work pool just uses the default of
prefect
now. Even if I define the family, won't it still publish a new revision each time ?
j
You're using an ECS Push Pool?
A new revision is only published if something in the task definition has changed
m
We are using pull pools for scheduled jobs, but maybe I will look into it. It could be that in the default configuration on the work pool that there are missing fields to establish a 'valid' configuration. This is without family defined
Copy code
Retrieving ECS task definition 'arn:aws:ecs:us-east-2:XXXXXXXXX:task-definition/prefect:518'...

Cached task definition 'arn:aws:ecs:us-east-2:XXXXXXXXX:task-definition/prefect:518' does not meet requirements

Registering ECS task definition...

Task definition request{
  "cpu": "4096",
  "memory": "8192",
  "executionRoleArn": "arn:aws:iam::XXXXXXXXX:role/prefect_ecs_execution_role",
  "containerDefinitions": [
    {
      "image": "<http://XXXXXXXXX.dkr.ecr.us-east-2.amazonaws.com/py310:1.0.1|XXXXXXXXX.dkr.ecr.us-east-2.amazonaws.com/py310:1.0.1>",
      "name": "prefect",
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-create-group": "true",
          "awslogs-group": "prefect",
          "awslogs-region": "us-east-2",
          "awslogs-stream-prefix": "Attribution Flow"
        }
      }
    }
  ],
  "family": "prefect",
  "requiresCompatibilities": [
    "FARGATE"
  ],
  "networkMode": "awsvpc"
j
Sorry I think I got a little confused here! The original issue in this thread is regarding: • ECS Push Pools • Too many concurrent attempts to create a new revision of the same family You are not using push pools but are experiencing the same error correct? The above logs look like your cached task definition is different then the one generated by your input values
m
To my knowledge we do not have a push pool as we started up a worker on ecs which created this pool. We do receive the concurrent request error, for one of our jobs which submits 8 separate flows. By assigning it a family with all of the proper configurations (which might not meet requirements as of now) it should not register new task definitions and thus alleviate the issue?
I'm guessing it might not work since another process kicks off these flows with different names (same each time), so the log stream prefix is different for each one of the 8 combinations
Copy code
my flow abc
my flow def
my flow ghi
j
ahhh interesting
m
Does that make sense? We iterate over an array to create all of the jobs, then that submits them
Copy code
@flow(flow_run_name = "Greek pnl Attribution Flow {strategy}",
      log_prints=True,
      retries = 3
      )
Kinda an edge case - I might just add a longer sleep between calls to register
j
That does, if the task definition is changing each flow, it will need to register another one each time. At the moment each task definition is only cached per deployemnt.
m
Gotcha. No worries
A solution would be to add the awslogs-stream-prefix in the job variables to be the same for all runs, but not sure if that can be passed through.
Copy code
ecs_work_pool: &ecs_work_pool_greekpnl
            name: ecs-worker-pool
            work_queue_name: default
            job_variables:
              containerDefinitions:
                - logConfiguration:
                    logDriver: awslogs
                    options:
                      awslogs-group: prefect
                      awslogs-region: us-west-2
                      awslogs-create-group: true
                      awslogs-stream-prefix: My Flow
j
it looks like it's it's only using the flow run name here if you don't pass a name on the top level of configuration:
Copy code
job_variables:
  name: My Config
Copy code
if configuration.configure_cloudwatch_logs:
            container["logConfiguration"] = {
                "logDriver": "awslogs",
                "options": {
                    "awslogs-create-group": "true",
                    "awslogs-group": "prefect",
                    "awslogs-region": region,
                    "awslogs-stream-prefix": configuration.name or "prefect",
                    **configuration.cloudwatch_logs_options,
                },
            }
m
I will try that. The above config didn't work for my use case, it still passed in the default stuff for container definitions
j
and then configuration.name is set as
self.name = self.name or flow_run.name
m
perfect!
j
if you get a chance would you mind filing an issue for this on the prefect-aws repo? This seems like a behavior we shouldn't do by default šŸ˜… the flow run name as stream-prefix is nice but maybe not at the expense of registering new each time
m
Yeah in regards to the container definitions / prefix within the prefect.yaml?
It's working when adding family and name. Thanks @Jake Kaplan - I'll submit a ticket with the description and the information you provided
šŸ™Œ 1