https://prefect.io logo
Title
r

Rahul Kadam

08/31/2022, 11:18 AM
Hi Team, Sometimes we are seeing task failures in ECS with this error
Failed to start task for flow run ad860ca0-61df-4055-b3f6-7f48c2d9ead4. Failures: [{'arn': 'arn:aws:ecs:us-east-1:XXXXXXXXXXXX:container-instance/XXXXXXXXXXXX', 'reason': 'RESOURCE:CPU'}, {'arn': 'arn:aws:ecs:us-east-1:XXXXXXXXXXXX:container-instance/XXXXXXXXXXXX', 'reason': 'RESOURCE:MEMORY'}, {'arn': 'arn:aws:ecs:us-east-1:XXXXXXXXXXXX:container-instance/XXXXXXXXXXXX', 'reason': 'RESOURCE:MEMORY'}]
Understandably, its because enough resources are not available on ECS nodes to schedule the tasks when launch type is EC2. But as per my understanding, in such cases the ECS task should get scheduled with "Provisioning" state, and ECS cluster should scale out and add more instances after which the task would run. However in our case we see above error and the task just fails to schedule. Is there a specific parameter we are missing which will provide the ECS feature that i mentioned above ? or this is the expected pattern ?
1
c

Christopher Boyd

08/31/2022, 2:16 PM
HI Rahul, This would be based on your auto scaling for the cluster - there could be limits, or constraints set on the cluster preventing that, or they could not be set at all so it does not scale? https://docs.aws.amazon.com/AmazonECS/latest/developerguide/cluster-auto-scaling.html