<@ULVA73B9P> I have been following <this guide> to...
# ask-marvin
b
@Marvin I have been following this guide to create an AWS ECS based worker. I have been able to deploy the prefect worker in ECS and I the work pool is active and ready in the Prefect UI. When I click the "quick run" button to execute a deployment associated with the work pool, the job does get picked up. However, after the job gets picked up, it almost immediately crashes with the following error:
Copy code
Worker 'ECSWorker f6800430-25be-4916-aa21-12adf6de4c7c' submitting flow run '5b570f0e-7605-4982-8bed-0bc68286fa19'
Failed to submit flow run '5b570f0e-7605-4982-8bed-0bc68286fa19' to infrastructure.
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/prefect/workers/base.py", line 1262, in _submit_run_and_capture_errors
    result = await self.run(
             ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/prefect_aws/workers/ecs_worker.py", line 711, in run
    ecs_client = await run_sync_in_worker_thread(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/prefect/utilities/asyncutils.py", line 233, in run_sync_in_worker_thread
    result = await anyio.to_thread.run_sync(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/anyio/_backends/_asyncio.py", line 2470, in run_sync_in_worker_thread
    return await future
           ^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/anyio/_backends/_asyncio.py", line 967, in run
    result = context.run(func, *args)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/prefect/utilities/asyncutils.py", line 243, in call_with_mark
    return call()
           ^^^^^^
  File "/usr/local/lib/python3.12/site-packages/prefect_aws/workers/ecs_worker.py", line 767, in _get_client
    return configuration.aws_credentials.get_client(client_type)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/prefect_aws/credentials.py", line 165, in get_client
    return _get_client_cached(ctx=self, client_type=client_type)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/prefect_aws/credentials.py", line 49, in _get_client_cached
    client = ctx.get_boto3_session().client(
             ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/prefect_aws/credentials.py", line 141, in get_boto3_session
    return boto3.Session(
           ^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/boto3/session.py", line 108, in __init__
    self._setup_loader()
  File "/usr/local/lib/python3.12/site-packages/boto3/session.py", line 149, in _setup_loader
    self._loader = self._session.get_component('data_loader')
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/botocore/session.py", line 816, in get_component
    return self._components.get_component(name)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/botocore/session.py", line 1198, in get_component
    self._components[name] = factory()
                             ^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/botocore/session.py", line 201, in <lambda>
    lambda: create_loader(self.get_config_variable('data_path')),
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/botocore/session.py", line 325, in get_config_variable
    return self.get_component('config_store').get_config_variable(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/botocore/configprovider.py", line 502, in get_config_variable
    return provider.provide()
           ^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/botocore/configprovider.py", line 708, in provide
    value = provider.provide()
            ^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/botocore/configprovider.py", line 795, in provide
    scoped_config = self._session.get_scoped_config()
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/botocore/session.py", line 424, in get_scoped_config
    raise ProfileNotFound(profile=profile_name)
botocore.exceptions.ProfileNotFound: The config profile (prefect) could not be found
Reported flow run '5b570f0e-7605-4982-8bed-0bc68286fa19' as crashed: Flow run could not be submitted to infrastructure:
ProfileNotFound('The config profile (prefect) could not be found')
m
thought for 51.1 seconds
Based on the research and the error message you provided, the issue is related to AWS credentials configuration. The error indicates that the ECS Worker is trying to use a profile named "prefect" that doesn't exist. Here's how to resolve this: 1. Option 1: Use IAM Role (Recommended) - Since you're running in ECS, the best practice is to use IAM roles. Make sure your ECS task has an IAM role with the necessary permissions. - The task role should have permissions for: - ECS operations - ECR (if pulling from ECR) - CloudWatch Logs - Any other AWS services your flows need to access 2. Option 2: Configure AWS Credentials If you need to use AWS credentials directly, you can: a. Use environment variables in your ECS task definition:
Copy code
json
   {
     "environment": [
       {
         "name": "AWS_ACCESS_KEY_ID",
         "value": "your_access_key"
       },
       {
         "name": "AWS_SECRET_ACCESS_KEY",
         "value": "your_secret_key"
       },
       {
         "name": "AWS_DEFAULT_REGION",
         "value": "your_region"
       }
     ]
   }
b. Or use AWS credentials block in Prefect:
Copy code
python
   from prefect_aws import AwsCredentials
   
   aws_credentials = AwsCredentials(
       aws_access_key_id="your_access_key",
       aws_secret_access_key="your_secret_key"
   )
   aws_credentials.save("default")
3. Verify Configuration - Check if your ECS task role has the necessary permissions - Verify that your AWS credentials are properly configured - Make sure your worker can access the necessary AWS services The recommended approach is Option 1 (using IAM roles) as it's more secure and follows AWS best practices. You shouldn't need to explicitly configure AWS profiles when running in ECS if your task role is properly configured. Would you like me to: 1. Research more specific details about the required IAM permissions for your use case? 2. Help you troubleshoot your current ECS task role configuration? 3. Provide more information about setting up AWS credentials in Prefect? Please let me know which aspect you'd like to explore further, and I'll be happy to help!
b
@Marvin I am running a self hosted instance of Prefect that is deployed on an EC2 instance in AWS. The ECS worker that is picking up the jobs has the following task definition JSON:
Copy code
{
  "taskDefinitionArn": "arn:aws:ecs:us-west-2:xxxxxxxxxx:task-definition/prefect-worker-task:1",
  "containerDefinitions": [
    {
      "name": "prefect-worker",
      "image": "prefecthq/prefect:3-latest",
      "cpu": 512,
      "memory": 1024,
      "portMappings": [],
      "essential": true,
      "command": [
        "/bin/sh",
        "-c",
        "pip install prefect-aws && prefect worker start --pool my-ecs-worker --type ecs"
      ],
      "environment": [
        {
          "name": "PREFECT_API_URL",
          "value": "<https://prefect.example.com/api>"
        }
      ],
      "mountPoints": [],
      "volumesFrom": [],
      "systemControls": []
    }
  ],
  "family": "prefect-worker-task",
  "taskRoleArn": "arn:aws:iam::xxxxxxxxxx:role/ecsTaskExecutionRole",
  "executionRoleArn": "arn:aws:iam::xxxxxxxxxx:role/ecsTaskExecutionRole",
  "networkMode": "awsvpc",
  "revision": 1,
  "volumes": [],
  "status": "ACTIVE",
  "requiresAttributes": [
    {
      "name": "com.amazonaws.ecs.capability.task-iam-role"
    },
    {
      "name": "com.amazonaws.ecs.capability.docker-remote-api.1.18"
    },
    {
      "name": "ecs.capability.task-eni"
    }
  ],
  "placementConstraints": [],
  "compatibilities": [
    "EC2",
    "FARGATE"
  ],
  "requiresCompatibilities": [
    "FARGATE"
  ],
  "cpu": "512",
  "memory": "1024",
  "registeredAt": "2025-08-04T16:52:06.012Z",
  "registeredBy": "arn:aws:iam::xxxxxxxxxx:user/me",
  "tags": []
}
The ECS task has both a
taskRoleArn
and
executionRoleArn
role define. This
ecsTaskExecutionRole
IAM role has the AWS managed policy
AmazonECSTaskExecutionRolePolicy
attached and the following trust policy:
Copy code
{
  "Version": "2008-10-17",
  "Statement": [
    {
      "Sid": "",
      "Effect": "Allow",
      "Principal": {
        "Service": "<http://ecs-tasks.amazonaws.com|ecs-tasks.amazonaws.com>"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}
In theory this should be enough, but I think the problem is related to the fact that the prefect startup logic is trying to find a profile named
prefect
which does not exist:
Copy code
ProfileNotFound: The config profile (prefect) could not be found
Is this code being executed on the ECS worker? If so I believe it should have is own credentials as a result of the attached IAM role.
m
thought for 45.4 seconds
Based on your setup and the research, I can identify the issue: 1. Yes, this code is being executed on the ECS worker itself, not in the tasks it launches. This is important because the error is happening when the worker is trying to submit flow runs, not during the actual flow execution. 2. The problem is that the ECS Worker needs its own AWS credentials configuration to create and manage ECS tasks. While your task role (
ecsTaskExecutionRole
) is correctly configured for task execution, the worker itself needs to be able to interact with ECS to launch these tasks. Here's how to fix this: 1. Option 1: Configure AWS Default Credentials (Recommended) Modify your task definition to include these environment variables:
Copy code
json
   "environment": [
     {
       "name": "PREFECT_API_URL",
       "value": "<https://prefect.example.com/api>"
     },
     {
       "name": "AWS_DEFAULT_REGION",
       "value": "us-west-2"
     },
     {
       "name": "AWS_USE_DEFAULT_CREDENTIALS",
       "value": "true"
     }
   ]
The
AWS_USE_DEFAULT_CREDENTIALS=true
environment variable tells the AWS SDK to use the task's IAM role credentials instead of looking for a profile. 2. Option 2: Add Required Permissions Your task role (
ecsTaskExecutionRole
) needs additional permissions. Create a custom policy and attach it:
Copy code
json
   {
       "Version": "2012-10-17",
       "Statement": [
           {
               "Effect": "Allow",
               "Action": [
                   "ecs:RunTask",
                   "ecs:StopTask",
                   "ecs:DescribeTasks",
                   "iam:PassRole"
               ],
               "Resource": "*"
           }
       ]
   }
3. Verify Task Role vs. Execution Role - Your
taskRoleArn
and
executionRoleArn
are currently the same. While this can work, it's generally better to separate them: - `executionRoleArn`: Used by ECS to pull container images and write logs (your current
AmazonECSTaskExecutionRolePolicy
is good for this) - `taskRoleArn`: Should have permissions for both: - The worker to manage ECS tasks - The actual flow runs to access whatever AWS services they need The error you're seeing suggests that the AWS SDK is falling back to looking for a profile because it's not properly picking up the task's IAM role credentials. Adding
AWS_USE_DEFAULT_CREDENTIALS=true
should resolve this by explicitly telling it to use the IAM role credentials. Would you like me to: 1. Provide a complete updated task definition with all these changes? 2. Help you create the proper IAM policies for separation of concerns between task and execution roles? 3. Explain more about how to debug AWS credential issues in ECS?
b
I updated the ECS task configuration to include the suggested environment variables, specifically the
AWS_USE_DEFAULT_CREDENTIALS=true
variable. When I run the deployment, I still have the same error:
Copy code
Reported flow run '39039aac-5a41-4cd4-abc3-9eb5d5867047' as crashed: Flow run could not be submitted to infrastructure:
ProfileNotFound('The config profile (prefect) could not be found')
I think the problem is not related to missing permissions on the role, but in fact the environment is not identifying the aws credentials at all and is trying to from a config profile named
prefect
which doesnt exist. It seems to me an issue with the boto3 logic and how the SDK is trying to get the variables, maybe it is not prepared to use the instance roles that are applied to the task environment and instead I should pass the AWS key and secret key directly in the env vars?
@Marvin I updated the ECS task configuration to include the suggested environment variables, specifically the
AWS_USE_DEFAULT_CREDENTIALS=true
variable. When I run the deployment, I still have the same error:
Copy code
Reported flow run '39039aac-5a41-4cd4-abc3-9eb5d5867047' as crashed: Flow run could not be submitted to infrastructure:
ProfileNotFound('The config profile (prefect) could not be found')
I think the problem is not related to missing permissions on the role, but in fact the environment is not identifying the aws credentials at all and is trying to from a config profile named
prefect
which doesnt exist. It seems to me an issue with the boto3 logic and how the SDK is trying to get the variables, maybe it is not prepared to use the instance roles that are applied to the task environment and instead I should pass the AWS key and secret key directly in the env vars?
m
thought for 37.8 seconds
Your analysis is correct! The issue is more fundamental than we initially thought. Looking at the error and the implementation details, here's what's happening: 1. The
prefect-aws
package is looking specifically for a profile named "prefect" in boto3's configuration, rather than following the standard AWS credential chain. 2. This is happening before it even gets to the point of trying to use the ECS task role credentials. Here's how we can fix this. You have two options: Option 1: Configure AWS Credentials Block (Recommended)
Copy code
python
from prefect_aws import AwsCredentials
from prefect_aws.ecs import ECSWorker

# Create and save AWS credentials block
aws_creds = AwsCredentials(
    # Option A: Use specific credentials
    aws_access_key_id="your_access_key",
    aws_secret_access_key="your_secret_key",
    
    # OR Option B: Use instance/task role (preferred)
    use_default_credentials=True  # This tells it to use the task role
)
aws_creds.save("default")  # Save with name "default"
Then update your task definition to include:
Copy code
json
"environment": [
  {
    "name": "PREFECT_API_URL",
    "value": "<https://prefect.example.com/api>"
  },
  {
    "name": "PREFECT_AWS_CREDENTIALS_BLOCK",
    "value": "default"  # Name of the saved credentials block
  }
]
Option 2: Direct Environment Variables If you prefer not to use blocks, you can set these environment variables in your task definition:
Copy code
json
"environment": [
  {
    "name": "PREFECT_API_URL",
    "value": "<https://prefect.example.com/api>"
  },
  {
    "name": "AWS_DEFAULT_REGION",
    "value": "us-west-2"
  },
  {
    "name": "PREFECT_AWS_USE_INSTANCE_CREDENTIALS",
    "value": "true"
  }
]
The key difference from our previous attempt is that we're now explicitly telling Prefect how to handle AWS credentials, rather than just telling boto3. The
PREFECT_AWS_USE_INSTANCE_CREDENTIALS=true
environment variable is specifically designed for this use case where you want to use the task's IAM role. I recommend trying Option 1 first, as it's the more robust solution and gives you more flexibility in how credentials are managed. The credentials block approach also makes it easier to debug issues as you can test the credentials configuration independently of the worker setup. Would you like me to: 1. Provide more details about setting up the AWS credentials block? 2. Show you how to verify the credentials are working correctly? 3. Help you troubleshoot if these solutions don't resolve the issue?
b
@Marvin are you sure this environment variable
PREFECT_AWS_CREDENTIALS_BLOCK
really exists? I have updated the task definition but am still getting the same error
m
thought for 47.3 seconds
Let me revise my approach. The issue appears to be more fundamental. Here's what we should try: 1. Update your ECS task definition to include the standard AWS environment variables:
Copy code
json
"environment": [
  {
    "name": "PREFECT_API_URL",
    "value": "<https://prefect.example.com/api>"
  },
  {
    "name": "AWS_DEFAULT_REGION",
    "value": "us-west-2"
  },
  {
    "name": "AWS_CONTAINER_CREDENTIALS_RELATIVE_URI",
    "value": "/v2/credentials"  // This is automatically set by ECS but we can explicitly set it
  }
]
2. Modify the IAM role to ensure it has the necessary permissions:
Copy code
json
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "ecs:RunTask",
                "ecs:StopTask",
                "ecs:DescribeTasks",
                "iam:PassRole",
                "ecs:ListTasks",
                "ecs:DescribeTaskDefinition"
            ],
            "Resource": "*"
        }
    ]
}
3. Try using explicit credentials in your task definition (as a temporary test to isolate the issue):
Copy code
json
"environment": [
  {
    "name": "AWS_ACCESS_KEY_ID",
    "value": "your-access-key"
  },
  {
    "name": "AWS_SECRET_ACCESS_KEY",
    "value": "your-secret-key"
  }
]
I apologize for my earlier incorrect information about the
PREFECT_AWS_CREDENTIALS_BLOCK
environment variable. That was incorrect. Could you try these steps and let me know what happens? Also, it would be helpful to know: 1. Can you run
aws sts get-caller-identity
within the container to verify the IAM role is being picked up? 2. Are there any other error messages in the logs besides the ProfileNotFound error? This will help us better understand where exactly the credential chain is breaking down.