Hi all, I'm using the ECS Agent and running the ta...
# ask-community
w
Hi all, I'm using the ECS Agent and running the tasks using the [ECS run config](https://docs.prefect.io/api/latest/run_configs.html#ecsrun). I'm passing a custom task defintion to the run configuration. I've realised that the execution role is picked up from the custom task definition (this does not match the behaviour described in the docs btw; they specify that
If not provided, the default on the agent will be used (if configured).
). This does not happen for the task role, which is instead passed from the agent. This behaviour is unexpected; I would have assumed when passing a custom task definition, that the roles I have defined (task role and execution role) would not be overridden by prefect. So, I've now modified naming convention for the task role for a particular flow, to test passing it via: `task_role_arn (str, optional)`: The name or full ARN for the IAM role to use for this task. If not provided, the default on the agent will be used (if configured). The role I have passed has full S3 permissions:
Copy code
"Statement": [
        {
            "Sid": "",
            "Effect": "Allow",
            "Action": "s3:*",
            "Resource": "*"
        }
    ]
}
The flow itself uses the [S3Upload](https://docs.prefect.io/api/latest/tasks/aws.html#s3upload) action. It fails with
Error uploading to S3: An error occurred (AccessDenied) when calling the PutObject operation: Access Denied
This makes absolutely no sense to me. For reference, I'm not providing any other AWS authentication to my code within the flow, or the S3Upload action, and the bucket exists within my account. I'm very familiar with AWS IAM interactions and this is quite confusing to me!
a
what storage do you use?
with that I mean the flow’s storage - e.g. S3, GitHub, …
k
Hey @Will, sorry I’m a bit confused. The code for the agent will prioritize the
task_role_arn
and
execution_role_arn
defined in the RunConfig. If there is none, it pulls the one defined on the agent. Are you saying you are seeing different behavior? Or are you saying this is unexpected?
What is the Prefect version of your registration and your image? I see fixes in 0.14.15
a
@Will I think that either you misunderstood the docs, or the docstring is defined in a confusing way, and we would then need to change that 🙂 My understanding is: • if you provide a custom
task_definition
to your ECSRun, then all values including the task role and execution role, will be taken from this task definition, not from the agent or other run config kwargs, • if you don’t provide a custom task definition, Prefect will create and register a new task definition (or a new revision of it) for you, based on the kwargs you provided to the
ECSRun
such as the
task_role_arn
and
execution_role_arn
- if those are not passed to the ECSRun, then Prefect will take those defined on the agent. So a custom task definition takes priority, then other ECSRun overrides, and if there are no overrides, the default values are taken from the agent definition. Does it make sense? I think, if you would provide a custom task definition, and on top of that specify task_role_arn and execution_role_arn, then there is ambiguity as to which of those values should the agent take when registering a new task definition for a flow.
w
Sorry, I was out last night when you both replied, thanks for the responses. I've done some further testing today. As can be seen in the block linked by Kevin, the
RunTask
call receives the new roles as
overrides
, which should mean that they will take precedent over anything previously defined in the task definition. https://docs.aws.amazon.com/AmazonECS/latest/APIReference/API_RunTask.html#ECS-RunTask-request-overrides However, I've just retested and confirmed that the execution role does not get overridden by the execution role from the agent. I've verified by passing a parameter to the
secrets
argument in the container definition: https://docs.aws.amazon.com/AmazonECS/latest/APIReference/API_ContainerDefinition.html The agent role doesn't have any secrets manager IAM permissions. The execution role on the custom task definition does; for this specific secret it has the
secretsmanager:GetSecretValue
permission. When I run the flow it is able to retrieve the secret and inject it as an environment variable. When I remove the permission from the custom task definition's execution role, and try to run the flow, it fails with:
Copy code
ResourceInitializationError: unable to pull secrets or registry auth: execution resource retrieval failed: unable to retrieve secret from asm: service call has been retried 1 time(s): failed to fetch secret arn:aws:secretsmanager:us-east-1...
So clearly something isn't working as expected here. This is just the execution role behaviour.
To respond to Anna; the precedence based on the code linked by Kevin, and the behaviour described in the overrides definition above, should lead to the following: 1. If none of
task_definition
,
task_definition_path
,
task_definition_arn
,
task_role_arn
or
execution_role_arn
are defined on the run config, both execution and task roles should be copied from those applied to the agent. I haven't verified this. 1. If there is a custom task definition (
task_definition
,
task_definition_path
or
task_definition_arn
), then according to https://github.com/PrefectHQ/prefect/blob/e699fce534f77106e52dda1f0f0b23a3f8bcdf81/src/prefect/agent/ecs/agent.py#L454-L464 the custom task definition's execution and task roles should both be overridden by those found on the agent. As described above, this is definitely not happening for the execution role, and I can't figure out why not; the AWS api docs indicate that it should function as intended. As for the task role, it is overridden by the prefect agent's role in this case. 1. If
task_role_arn
or
execution_role_arn
are set, these should take priority. I can confirm from looking at the output of
DescribeTasks
that the task role passed here is successfully applied to the running task.
Now for the root cause of the issue! I discovered that if I manually set up a boto client and run the S3 calls, I can do everything I would expect. So this means that the
S3Upload
task is incorrectly propagating boto credentials. If I were to take a guess at what is happening, it would be something to do with the caching behaviour in https://github.com/PrefectHQ/prefect/blob/ca19b43947eb16e36fe9ea777e4e65c15baaf0b8/src/prefect/utilities/aws.py#L23-L129 I am using the
S3Upload
task like so (following the docs example, and simplified):
Copy code
python
import prefect
from prefect import Flow
from prefect.tasks.secrets import EnvVarSecret
from prefect.tasks.aws.s3 import S3Upload

@task
def test_s3(s3_bucket: str):
    logger = prefect.context.get("logger")

    s3 = boto3.client("s3")

    list_buckets_response = s3.list_buckets()
    <http://logger.info|logger.info>(f"Buckets: {[x['Name'] for x in list_buckets_response['Buckets']]}")

    res = s3.put_object(
        Bucket=s3_bucket, Key="test.json", Body=b'{"hello": "world"}'
    )


upload_task = S3Upload()

with Flow("hello_world") as flow:
    s3_bucket = EnvVarSecret("S3_BUCKET", raise_if_missing=True)

    test_s3(s3_bucket)

    upload_task('{"hello": "world"}', key="data.json", bucket=s3_bucket)

if __name__ == "__main__":
    flow.run()
test_s3
completes successfully.
upload_task
fails with the
PutObject
access denied exception detailed above.
k
I see. Nice job digging on the boto3. For the passing of the role, what version are you on?
w
So in conclusion the two problems I've run into: 1. Execution role behaviour is not the same as in the documentation. 2.
S3Upload
is not correctly propagating credentials in the ECS task environment, when boto calls work correctly Additionally, while not really a bug, I think it should work the way @Anna Geller describes, with a custom task definition taking priority over everything else. The current behaviour is unintuitive. This is obviously not a bug as it's in the docs 😛
@Kevin Kho the agent is on
prefecthq/prefect:0.15.6-python3.9
k
I’ll open a ticket after I dig a bit and summarize
👍 1
w
One other thing that's come out of this, and would also be fixed by defaulting to the role from the task definition if one is provided, is that while you can pass a task definition family to the
ECSRun
config instead of a full ARN, you can't do this for the
task_role_arn
- even though it's specified in the Prefect docs that you can. https://docs.prefect.io/api/latest/run_configs.html#ecsrun See the parameter in here that it is passed to directly: https://docs.aws.amazon.com/AmazonECS/latest/APIReference/API_TaskOverride.html This is quite a problem for us and will require a rework of how we usually deploy resources to be shared across the org. We use AWS Organizations, so each developer gets their own AWS account and we have a few for different environments. If it were possible to not have to pass a full ARN here, we would not be restricted on an account basis; so could share flows between the different accounts quite simply. It's still possible to do but will require a lot more fiddling!
a
You mean this: `task_definition_arn (str, optional)`: A pre-registered task definition ARN to use (either 
family
family:version
, or a full task definition ARN). Should be changed to: `task_definition_arn (str, optional)`: A pre-registered task definition ARN to use - a full task definition ARN is required. Correct?
I will dig deeper later
w
@Anna Geller no the description is correct for
task_definition_arn
. The description for
task_role_arn
is not correct:
`task_role_arn (str, optional)`: The name or full ARN for the IAM role to use for this task. If not provided, the default on the agent will be used (if configured).
Only the full ARN will work here.
a
got it, thx. So
task_role_arn="arn:aws:iam::XXX:role/prefectTaskRole"
works, but you’d prefer to only specify
task_role_arn="prefectTaskRole"
@Will FYI: I created an issue - feel free to chime in if you want to add some example, add more details, or if something is not clear: https://github.com/PrefectHQ/prefect/issues/5110
w
Thanks @Anna Geller, I posted a response. Might I also suggest a change in title; "ECSRun behaviour with custom task definitions and roles should be clarified" or similar to capture the whole issue.
I also did not post the point regarding the execution role behaviour I've experienced, which is that in the case of running a flow, it does use the execution role from the custom task definition. I thought adding that point to this issue would be confusing, and I'm not certain why it is happening (as the override appears to be passed correctly to
run_task
). It may be worth making a separate issue for this, I'm happy to do so if that's ok with you. EDIT: As I did not define a default role for flows on the agent (I missed that I didn't do this), this behaviour is expected.
a
thx, I updated the title. Sure, go ahead and create a new issue, thanks for that!
@Will last thing to finalize all issues from this thread: if you use your own boto3 code to upload file rather than the S3Upload task, does it work as it should (i.e. taking the permissions from the IAM task role)? S3Upload is not optimized for running in-cluster so I’d suspect this may work better for you:
Copy code
import boto3
from prefect import task

@task
def upload_file(file_name, bucket, object_name):
    s3_client = boto3.client("s3")
    s3_client.upload_file(file_name, bucket, object_name)
w
Yes, as mentioned above that's what I've had to do. According to the documentation for S3Upload, the authentication should work in the same way as boto3 if the Prefect secret for AWS auth is not provided (which in my case it was not, so the behaviour is incorrect). I imagine that anything else that uses the get_boto_client function will have the same issue.
🙌 1
a
hey y'all! don't want to zombie an old issue, but wasn't sure where to put this -- i can reproduce the issue will was seeing and i want to make sure i understand the expected behaviour correctly, so i laid out what i'm seeing here: https://github.com/PrefectHQ/prefect/issues/5110#issuecomment-1020720541
a
I answered in the issue