Hello All, Wondering if anyone has ran into the following issue: Using an ECS run config, submittin...
k
Hello All, Wondering if anyone has ran into the following issue: Using an ECS run config, submitting a new flow run fails with a ClientError on the register task definition. I am using a custom yml file in my agent (which is also ECS) to define the parameters for the flow run job def. I recently updated the yml file (increasing the memory size given to the flow container). After inspecting CloudTrail, it seems the
RegisterTaskDefinition
API call is being made with no parameters
Was able to find the traceback in the agent logs
Copy code
2022-01-10 16:22:20Traceback (most recent call last):
2022-01-10 16:22:20File "/usr/local/lib/python3.7/site-packages/prefect/agent/agent.py", line 391, in _deploy_flow_run
2022-01-10 16:22:20deployment_info = self.deploy_flow(flow_run)
2022-01-10 16:22:20File "/usr/local/lib/python3.7/site-packages/prefect/agent/ecs/agent.py", line 295, in deploy_flow
2022-01-10 16:22:20resp = self.ecs_client.register_task_definition(**taskdef)
2022-01-10 16:22:20File "/usr/local/lib/python3.7/site-packages/botocore/client.py", line 386, in _api_call
2022-01-10 16:22:20return self._make_api_call(operation_name, kwargs)
2022-01-10 16:22:20File "/usr/local/lib/python3.7/site-packages/botocore/client.py", line 705, in _make_api_call
2022-01-10 16:22:20raise error_class(parsed_response, operation_name)
2022-01-10 16:22:20botocore.errorfactory.ClientException: An error occurred (ClientException) when calling the RegisterTaskDefinition operation: No Fargate configuration exists for given values.
a
can you share your ECSRun run config? hard to find what may be the issue otherwise
k
Whats the best way to find that run config exactly? I feel like there are configurations related to ECSRun spread across • The agent configuration (via
prefect agent ecs start
, I am passing
--task-definition /root/.prefect/flow_task_def.yml
and
--run-task-kwargs /root/.prefect/flow_run_task_kwargs.yml
• I am specifying an image on my flows before I register them:
flow.run_config = ECSRun(image=image)
Currently my whole system (server backend, ECS agent, etc) are non functional. All flow submissions trigger this client exception. I tried submitting a flow and specifying a custom value for memory (which is what I was trying to change in the first place). This seemed to make it work. I can see in CloudTrail that the RegisterTaskDefinition api call is now populated:
Copy code
"requestParameters": {
    "family": "prefect-*****",
    "taskRoleArn": "arn:aws:iam::**********:role/prefect-flow-task-auth-application-prod",
    "executionRoleArn": "arn:aws:iam::**********:role/ecs-execution-global-prod",
    "networkMode": "awsvpc",
    "containerDefinitions": [
        {
            "name": "flow",
            "image": "**********.<http://dkr.ecr.us-west-2.amazonaws.com/*********|dkr.ecr.us-west-2.amazonaws.com/*********>",
            "cpu": 0,
            "environment": [
                {
                    "name": "PREFECT__CONTEXT__IMAGE",
                    "value": "***REDACTED***"
                }
            ]
        }
    ],
    "requiresCompatibilities": [
        "FARGATE"
    ],
    "cpu": "4096",
    "memory": "10240",
    "tags": [
        {
            "key": "prefect:flow-id",
            "value": "3b045cea-9a05-4ba7-9f70-e7969c72e8fe"
        },
        {
            "key": "prefect:flow-version",
            "value": "33"
        }
    ]
},
previously, this was null
k
A bit unclear, is it working now? Could I see the
yml
with the changed memory size that broke things?
k
The yaml was
Copy code
networkMode: awsvpc
family: flows
cpu: 4096
memory: 8192
containerDefinitions:
  - name: flow
requiresCompatibilities:
  - FARGATE
executionRoleArn: arn:aws:iam::*****:role/ecs-execution-global-prod
taskRoleArn: arn:aws:iam::*****:role/prefect-flow-task-auth-application-prod
now its:
Copy code
networkMode: awsvpc
family: flows
cpu: 4096
memory: 10240
containerDefinitions:
  - name: flow
requiresCompatibilities:
  - FARGATE
executionRoleArn: arn:aws:iam::*****:role/ecs-execution-global-prod
taskRoleArn: arn:aws:iam::*****:role/prefect-flow-task-auth-application-prod
its not working persay
The "default" path for registering a task def does not work. If I specify a custom value in the ECSRun config via the UI, it seems to work
okay I might be a huge idiot
k
So 8192 worked and then changing to the 10240 did not?
k
correct yea
but I should say that 8192 has been whats been set for a long time. Due to how stable things have been, I have not needed to redploy the agent for months now. 🙂
So it could be some issue with just "the agent was updated"
I am trying • set it back to 8192 • deploy • set it back to 10240 • deploy again
I also have DEBUG log level set on my agent now
I have no idea, seems to be working now
sorry
k
At the 10420? 🤷