David Martin Calalang
08/12/2025, 5:20 PMMarvin
08/12/2025, 5:20 PMMarvin
08/12/2025, 5:23 PMprefect.yaml. On ECS, image pulls from ECR are handled by the task’s execution role. If that role isn’t set or doesn’t have the right permissions/network path, your task will sit in PENDING looking thoroughly unmotivated.
What to do
- Attach an execution role to the ECS task definition your Prefect ECS work pool/worker uses. That role must have the managed policy AmazonECSTaskExecutionRolePolicy (which includes ECR pull and CloudWatch Logs permissions).
- Ensure the task can reach ECR. If you run in private subnets without a NAT, add VPC endpoints for ECR (api + dkr) and S3, or give the task a route to the internet.
- If your ECR repo is in a different account, update the ECR repository policy to allow the task execution role to pull.
- Make sure the Prefect worker itself has AWS permissions to register task definitions and run tasks (boto3 calls to ECS/Logs/ECR).
Where to set the role in Prefect
- In Prefect 3.x with an ECS work pool, set the executionRoleArn (and optionally taskRoleArn) via:
- The work pool’s base job template (UI), or
- Per-deployment work_pool.job_variables in your prefect.yaml.
Example prefect.yaml snippet (per-deployment override)
deployments:
- name: my-deployment
entrypoint: path/to/flow.py:flow
work_pool:
name: my-ecs-pool
job_variables:
cluster: arn:aws:ecs:us-east-1:123456789012:cluster/my-cluster
launch_type: FARGATE
task_definition:
cpu: "512"
memory: "1024"
executionRoleArn: arn:aws:iam::123456789012:role/ecsTaskExecutionRole
taskRoleArn: arn:aws:iam::123456789012:role/myAppTaskRole
network_configuration:
awsvpcConfiguration:
subnets:
- subnet-abc...
- subnet-def...
securityGroups:
- sg-123...
assignPublicIp: DISABLED
build:
image: <http://123456789012.dkr.ecr.us-east-1.amazonaws.com/my-repo:latest|123456789012.dkr.ecr.us-east-1.amazonaws.com/my-repo:latest>
Notes
- You do not need repositoryCredentials for ECR. That’s for private registries like Docker Hub via Secrets Manager. If you ever need it, it looks like this under task_definition.containerDefinitions[0].repositoryCredentials.credentialsParameter, but again, not for ECR.
- Check ECS task “Events” in the AWS console; if it’s an image pull problem, you’ll see a clear reason (permissions, network, or “no basic auth credentials” for non‑ECR).
Useful links
- ECS task execution role (required for ECR pulls)
- ECR authentication overview
- prefect-aws docs (ECS work pool/worker details)
If you share:
- whether you’re using private subnets (and NAT/VPC endpoints),
- the execution role attached (if any),
- and whether your ECR is cross-account,
…I can point at the exact missing piece, sparing us both more PENDING.David Martin Calalang
08/12/2025, 5:24 PMTask stopped at: 2025-08-12T17:22:00.603Z Task stopped with stop code: "TaskFailedToStart" and reason: "CannotPullContainerError: pull image manifest has been retried 7 time(s): failed to resolve ref {private_registry}: failed to do request: Head "{private_registry}": dial tcp {ip}: i/o timeout".Marvin
08/12/2025, 5:24 PMMarvin
08/12/2025, 5:25 PMassignPublicIp: ENABLED on the task. If pulls start working, it’s definitely a private-subnet-without-egress issue.
- Proper fix (no public IP): either add a NAT gateway to those subnets, or add VPC endpoints:
- Interface endpoints: com.amazonaws.<region>.ecr.api and com.amazonaws.<region>.ecr.dkr
- Gateway endpoint: com.amazonaws.<region>.s3
- Ensure VPC DNS hostnames/resolution are enabled, and security groups allow 443:
- Your task ENI SG must allow egress to the endpoint ENI SG
- The endpoint ENI SG must allow inbound 443 from your task ENI SG
- Verify execution role is set and has the managed policy AmazonECSTaskExecutionRolePolicy (needed for ECR API calls/logs). Wrong permissions usually don’t cause timeouts, but you need this anyway.
- Make sure the ECR registry and the ECS task are in the same region (cross‑region pulls plus no internet == timeouts).
- If this is a non‑ECR private registry, you still need outbound connectivity to that registry, and you’d use repositoryCredentials (Secrets Manager) in the ECS task definition. For ECR, you do not need repositoryCredentials.
How to set the quick test in your Prefect deployment (Prefect 3.x, ECS work pool):
work_pool:
name: my-ecs-pool
job_variables:
network_configuration:
awsvpcConfiguration:
subnets:
- subnet-abc...
- subnet-def...
securityGroups:
- sg-123...
assignPublicIp: ENABLED
If you’re keeping tasks private, configure endpoints or NAT and flip assignPublicIp back to DISABLED.
Helpful references
- ECS task execution role: AWS docs
- VPC endpoints for ECR: AWS docs
- Pulling ECR images from private subnets (needs ECR + S3): AWS docs
Can you share:
- Fargate or EC2 launch type?
- Whether the task subnets are private and whether a NAT or VPC endpoints are configured?
- Your current assignPublicIp setting?
That’ll let me point to the exact missing piece.