<@ULVA73B9P> i have attached auto scaling group fo...
# ask-marvin
r
@Marvin i have attached auto scaling group for my ecs cluster, with minimum two and max 4 instances. one instance can handle 7 ecs tasks at a time., when i submitted 20 tasks at a time in a loop, 6 of them failed with error
Failed to run ECS task: RESOURCE:CPU
, ideally the asg should have handled this automatically isnt it?, what triggers the scale
m
thought for 74.1 seconds
Short answer: your ASG didn’t scale because nothing you’ve configured is telling it to scale for one-off ECS RunTask bursts. With 2 instances that can each run ~7 tasks, you had capacity for ~14 tasks; 6 out of 20 failed with
RESOURCE:CPU
because there wasn’t enough EC2 capacity at the moment of placement. What’s going on -
RESOURCE:CPU
means the ECS scheduler couldn’t find enough free CPU on any container instance to place the task. It doesn’t queue them by default; the placement just fails. - An Auto Scaling Group only scales when a scaling policy/alarm triggers. Typical out-of-the-box ASG target tracking on EC2 instance CPU won’t help here—CPU on your instances was not high because those tasks never got placed. - ECS will only “auto-scale the EC2 capacity” for you if you wire up ECS Capacity Providers with Managed Scaling and use them with an ECS Service. One-off
RunTask
calls generally do not drive managed scale-out in time (and managed scaling primarily acts on services’ desired count). - Scale-out also takes time (launch + bootstrap), so even with correct wiring you need retries so tasks can place after new capacity arrives. What actually triggers scale - Without capacity providers: whatever CloudWatch alarms/target-tracking you attached to the ASG (e.g., instance CPU, custom metrics). If you haven’t configured any, nothing will trigger. - With ECS Capacity Providers + Managed Scaling on a Service: ECS attaches target-tracking policies to the ASG based on the capacity provider’s utilization target; ECS will scale the ASG when service desired tasks cannot be placed. See ECS Capacity Providers and Managed Scaling and ECS Service Auto Scaling. Ways to fix this - Easiest: use Fargate launch type so you don’t manage EC2 capacity. No ASG to scale, and your 20 tasks will just start (subject to Fargate account limits). - If you must use EC2: - Use an ECS Service with a Capacity Provider (Managed Scaling on) and submit work by changing the service desired count or using queue-driven service autoscaling. This allows ECS to scale the ASG proactively. - If you stick with
RunTask
, add proactive scaling signals for the ASG (e.g., target tracking on ECS cluster `CPUReservation`/`MemoryReservation`, or step scaling on a queue depth metric), and implement retries/backoff so tasks that initially fail can be resubmitted once new instances come up. - Consider placement strategies to reduce CPU fragmentation (e.g.,
binpack
on
cpu
). If each task requests a specific
cpu
value, fragmentation can prevent placement even when “total” CPU seems sufficient. See Task placement strategies. - Pre-scale the ASG (temporarily bump desired capacity) ahead of expected bursts. - Also verify task CPU/memory requests and instance types align; overly large `cpu`/`memory` settings per task can make placement much harder. If you’re launching these via Prefect (3.x) ECS task runs - Prefer Fargate for bursty workloads. - If running on EC2, you can specify a capacity provider strategy in your ECS infrastructure so runs go to a capacity provider backed by an ASG with Managed Scaling, but you’ll still want retries/backoff for bursts. A couple of clarifying questions to guide next steps: - Are you running these as one-off
RunTask
calls (e.g., batch/burst jobs) or as part of an ECS Service? - Do you have an ECS Capacity Provider with Managed Scaling attached to your cluster/ASG? - If you’re using Prefect to launch these, which launch type are you using (Fargate vs EC2), and do you have retries enabled for run placement failures? Helpful docs
- ECS Capacity Providers and Managed Scaling: Docs - ECS Service Auto Scaling: Docs - Task placement and
RESOURCE:CPU
troubleshooting: Troubleshooting guide