David Martin Calalang
08/12/2025, 5:20 PMMarvin
08/12/2025, 5:20 PMMarvin
08/12/2025, 5:23 PMprefect.yaml
. On ECS, image pulls from ECR are handled by the task’s execution role. If that role isn’t set or doesn’t have the right permissions/network path, your task will sit in PENDING looking thoroughly unmotivated.
What to do
- Attach an execution role to the ECS task definition your Prefect ECS work pool/worker uses. That role must have the managed policy AmazonECSTaskExecutionRolePolicy
(which includes ECR pull and CloudWatch Logs permissions).
- Ensure the task can reach ECR. If you run in private subnets without a NAT, add VPC endpoints for ECR (api + dkr) and S3, or give the task a route to the internet.
- If your ECR repo is in a different account, update the ECR repository policy to allow the task execution role to pull.
- Make sure the Prefect worker itself has AWS permissions to register task definitions and run tasks (boto3 calls to ECS/Logs/ECR).
Where to set the role in Prefect
- In Prefect 3.x with an ECS work pool, set the executionRoleArn
(and optionally taskRoleArn
) via:
- The work pool’s base job template (UI), or
- Per-deployment work_pool.job_variables
in your prefect.yaml
.
Example prefect.yaml
snippet (per-deployment override)
deployments:
- name: my-deployment
entrypoint: path/to/flow.py:flow
work_pool:
name: my-ecs-pool
job_variables:
cluster: arn:aws:ecs:us-east-1:123456789012:cluster/my-cluster
launch_type: FARGATE
task_definition:
cpu: "512"
memory: "1024"
executionRoleArn: arn:aws:iam::123456789012:role/ecsTaskExecutionRole
taskRoleArn: arn:aws:iam::123456789012:role/myAppTaskRole
network_configuration:
awsvpcConfiguration:
subnets:
- subnet-abc...
- subnet-def...
securityGroups:
- sg-123...
assignPublicIp: DISABLED
build:
image: <http://123456789012.dkr.ecr.us-east-1.amazonaws.com/my-repo:latest|123456789012.dkr.ecr.us-east-1.amazonaws.com/my-repo:latest>
Notes
- You do not need repositoryCredentials
for ECR. That’s for private registries like Docker Hub via Secrets Manager. If you ever need it, it looks like this under task_definition.containerDefinitions[0].repositoryCredentials.credentialsParameter
, but again, not for ECR.
- Check ECS task “Events” in the AWS console; if it’s an image pull problem, you’ll see a clear reason (permissions, network, or “no basic auth credentials” for non‑ECR).
Useful links
- ECS task execution role (required for ECR pulls)
- ECR authentication overview
- prefect-aws docs (ECS work pool/worker details)
If you share:
- whether you’re using private subnets (and NAT/VPC endpoints),
- the execution role attached (if any),
- and whether your ECR is cross-account,
…I can point at the exact missing piece, sparing us both more PENDING.David Martin Calalang
08/12/2025, 5:24 PMTask stopped at: 2025-08-12T17:22:00.603Z Task stopped with stop code: "TaskFailedToStart" and reason: "CannotPullContainerError: pull image manifest has been retried 7 time(s): failed to resolve ref {private_registry}: failed to do request: Head "{private_registry}": dial tcp {ip}: i/o timeout".
Marvin
08/12/2025, 5:24 PMMarvin
08/12/2025, 5:25 PMassignPublicIp: ENABLED
on the task. If pulls start working, it’s definitely a private-subnet-without-egress issue.
- Proper fix (no public IP): either add a NAT gateway to those subnets, or add VPC endpoints:
- Interface endpoints: com.amazonaws.<region>.ecr.api and com.amazonaws.<region>.ecr.dkr
- Gateway endpoint: com.amazonaws.<region>.s3
- Ensure VPC DNS hostnames/resolution are enabled, and security groups allow 443:
- Your task ENI SG must allow egress to the endpoint ENI SG
- The endpoint ENI SG must allow inbound 443 from your task ENI SG
- Verify execution role is set and has the managed policy AmazonECSTaskExecutionRolePolicy
(needed for ECR API calls/logs). Wrong permissions usually don’t cause timeouts, but you need this anyway.
- Make sure the ECR registry and the ECS task are in the same region (cross‑region pulls plus no internet == timeouts).
- If this is a non‑ECR private registry, you still need outbound connectivity to that registry, and you’d use repositoryCredentials
(Secrets Manager) in the ECS task definition. For ECR, you do not need repositoryCredentials
.
How to set the quick test in your Prefect deployment (Prefect 3.x, ECS work pool):
work_pool:
name: my-ecs-pool
job_variables:
network_configuration:
awsvpcConfiguration:
subnets:
- subnet-abc...
- subnet-def...
securityGroups:
- sg-123...
assignPublicIp: ENABLED
If you’re keeping tasks private, configure endpoints or NAT and flip assignPublicIp
back to DISABLED.
Helpful references
- ECS task execution role: AWS docs
- VPC endpoints for ECR: AWS docs
- Pulling ECR images from private subnets (needs ECR + S3): AWS docs
Can you share:
- Fargate or EC2 launch type?
- Whether the task subnets are private and whether a NAT or VPC endpoints are configured?
- Your current assignPublicIp
setting?
That’ll let me point to the exact missing piece.