<@ULVA73B9P> I'm deploying a prefect flow to AWS E...
# ask-marvin
d
@Marvin I'm deploying a prefect flow to AWS ECS. In my .yaml file, I specify that the image_name is coming form an ECR repository. Is there a way for me to provide credentials for this? Right now the task that represents the deployment run starts, but is stuck on Pending and I believe its an issue with pulling from ECR.
m
thought for 165.4 seconds
Short answer: you don’t pass ECR credentials in
prefect.yaml
. On ECS, image pulls from ECR are handled by the task’s execution role. If that role isn’t set or doesn’t have the right permissions/network path, your task will sit in PENDING looking thoroughly unmotivated. What to do - Attach an execution role to the ECS task definition your Prefect ECS work pool/worker uses. That role must have the managed policy
AmazonECSTaskExecutionRolePolicy
(which includes ECR pull and CloudWatch Logs permissions). - Ensure the task can reach ECR. If you run in private subnets without a NAT, add VPC endpoints for ECR (api + dkr) and S3, or give the task a route to the internet. - If your ECR repo is in a different account, update the ECR repository policy to allow the task execution role to pull. - Make sure the Prefect worker itself has AWS permissions to register task definitions and run tasks (boto3 calls to ECS/Logs/ECR). Where to set the role in Prefect - In Prefect 3.x with an ECS work pool, set the
executionRoleArn
(and optionally
taskRoleArn
) via: - The work pool’s base job template (UI), or - Per-deployment
work_pool.job_variables
in your
prefect.yaml
. Example
prefect.yaml
snippet (per-deployment override)
Copy code
deployments:
  - name: my-deployment
    entrypoint: path/to/flow.py:flow
    work_pool:
      name: my-ecs-pool
      job_variables:
        cluster: arn:aws:ecs:us-east-1:123456789012:cluster/my-cluster
        launch_type: FARGATE
        task_definition:
          cpu: "512"
          memory: "1024"
          executionRoleArn: arn:aws:iam::123456789012:role/ecsTaskExecutionRole
          taskRoleArn: arn:aws:iam::123456789012:role/myAppTaskRole
        network_configuration:
          awsvpcConfiguration:
            subnets:
              - subnet-abc...
              - subnet-def...
            securityGroups:
              - sg-123...
            assignPublicIp: DISABLED
    build:
      image: <http://123456789012.dkr.ecr.us-east-1.amazonaws.com/my-repo:latest|123456789012.dkr.ecr.us-east-1.amazonaws.com/my-repo:latest>
Notes - You do not need
repositoryCredentials
for ECR. That’s for private registries like Docker Hub via Secrets Manager. If you ever need it, it looks like this under
task_definition.containerDefinitions[0].repositoryCredentials.credentialsParameter
, but again, not for ECR. - Check ECS task “Events” in the AWS console; if it’s an image pull problem, you’ll see a clear reason (permissions, network, or “no basic auth credentials” for non‑ECR). Useful links - ECS task execution role (required for ECR pulls) - ECR authentication overview - prefect-aws docs (ECS work pool/worker details) If you share: - whether you’re using private subnets (and NAT/VPC endpoints), - the execution role attached (if any), - and whether your ECR is cross-account, …I can point at the exact missing piece, sparing us both more PENDING.
d
@Marvin Yes, you're correct, it is a container pull error. I'm attaching the error message I'm getting...
Copy code
Task stopped at: 2025-08-12T17:22:00.603Z Task stopped with stop code: "TaskFailedToStart" and reason: "CannotPullContainerError: pull image manifest has been retried 7 time(s): failed to resolve ref {private_registry}: failed to do request: Head "{private_registry}": dial tcp {ip}: i/o timeout".
m
thought for 55.5 seconds
Thanks for the error details. That’s a network problem, not credentials. “i/o timeout” on the HEAD to the registry means the task can’t reach the registry endpoint at all. If it were auth, you’d see “no basic auth credentials” or an AccessDenied on ECR APIs. Checklist to fix: - If you’re on Fargate in private subnets: - Quick test: set
assignPublicIp: ENABLED
on the task. If pulls start working, it’s definitely a private-subnet-without-egress issue. - Proper fix (no public IP): either add a NAT gateway to those subnets, or add VPC endpoints: - Interface endpoints: com.amazonaws.<region>.ecr.api and com.amazonaws.<region>.ecr.dkr - Gateway endpoint: com.amazonaws.<region>.s3 - Ensure VPC DNS hostnames/resolution are enabled, and security groups allow 443: - Your task ENI SG must allow egress to the endpoint ENI SG - The endpoint ENI SG must allow inbound 443 from your task ENI SG - Verify execution role is set and has the managed policy
AmazonECSTaskExecutionRolePolicy
(needed for ECR API calls/logs). Wrong permissions usually don’t cause timeouts, but you need this anyway. - Make sure the ECR registry and the ECS task are in the same region (cross‑region pulls plus no internet == timeouts). - If this is a non‑ECR private registry, you still need outbound connectivity to that registry, and you’d use
repositoryCredentials
(Secrets Manager) in the ECS task definition. For ECR, you do not need
repositoryCredentials
. How to set the quick test in your Prefect deployment (Prefect 3.x, ECS work pool):
Copy code
work_pool:
  name: my-ecs-pool
  job_variables:
    network_configuration:
      awsvpcConfiguration:
        subnets:
          - subnet-abc...
          - subnet-def...
        securityGroups:
          - sg-123...
        assignPublicIp: ENABLED
If you’re keeping tasks private, configure endpoints or NAT and flip
assignPublicIp
back to DISABLED. Helpful references - ECS task execution role: AWS docs - VPC endpoints for ECR: AWS docs - Pulling ECR images from private subnets (needs ECR + S3): AWS docs Can you share: - Fargate or EC2 launch type? - Whether the task subnets are private and whether a NAT or VPC endpoints are configured? - Your current
assignPublicIp
setting? That’ll let me point to the exact missing piece.