https://prefect.io logo
#prefect-community
Title
# prefect-community
j

Jan Bršťák

05/18/2022, 8:24 AM
Hello, we are running ECS Agent with
prefecthq/prefect:1.2.1-python3.9
image, and we are using Prefect Cloud. All seems to be fine with the Agent, we can see it in the UI, and also in Cloudwatch logs, whenever we trigger a flow run, it says
Deploying flow run…
and
Completed deployment of flow run
. But no Flows are getting started, it gets stuck on Submitted every time (it worked before, but we changed image for agent from
prefecthq/prefect:0.14.13-python3.8
and also roles in AWS which have the same permissions as original had). There are no errors, and nothing in logs. I’m not really sure what to do? Thanks for any help. Run config looks like this:
RUN_CONFIG = ECSRun(
_labels_=["prod"],
_task_role_arn_="arn:aws:iam::XXX:role/prefectTaskRole",
_execution_role_arn_="arn:aws:iam::XXX:role/prefectTaskExecutionRole",
_task_definition_arn_="prefect-task:4",
_run_task_kwargs_=dict(
_cluster_="XXX",
),
)
1
Role permissions are following:
prefectTaskExecutionRole
Copy code
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "ecr:GetAuthorizationToken",
        "ecr:BatchCheckLayerAvailability",
        "ecr:GetDownloadUrlForLayer",
        "ecr:BatchGetImage",
        "logs:CreateLogStream",
        "logs:PutLogEvents"
      ],
      "Resource": "*"
    }
  ]
}
prefectTaskRole
Copy code
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "ec2:AuthorizeSecurityGroupIngress",
        "ec2:CreateSecurityGroup",
        "ec2:CreateTags",
        "ec2:DescribeNetworkInterfaces",
        "ec2:DescribeSecurityGroups",
        "ec2:DescribeSubnets",
        "ec2:DescribeVpcs",
        "ec2:DeleteSecurityGroup",
        "ecs:CreateCluster",
        "ecs:DeleteCluster",
        "ecs:DeregisterTaskDefinition",
        "ecs:DescribeClusters",
        "ecs:DescribeTaskDefinition",
        "ecs:DescribeTasks",
        "ecs:ListAccountSettings",
        "ecs:ListClusters",
        "ecs:ListTaskDefinitions",
        "ecs:RegisterTaskDefinition",
        "ecs:RunTask",
        "ecs:StopTask",
        "iam:PassRole",
        "logs:CreateLogStream",
        "logs:CreateLogGroup",
        "logs:PutLogEvents",
        "logs:DescribeLogGroups",
        "logs:GetLogEvents"
      ],
      "Resource": "*"
    }
  ]
}
so PassRole include here an additional S3 bucket policy:
Copy code
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:*"
      ],
      "Resource": [
        "arn:aws:s3:::janb-prefect-datasets/*"
      ]
    }
  ]
}
a

Anna Geller

05/18/2022, 10:45 AM
Stuck in Submitted indicates an issue in the execution layer Your setup looks good, I wonder whether it just takes time for AWS to provision the actual resources? do you see ECS tasks being spun up in the ECS console when you trigger a flow run? You said something interesting: in case you are migrating your set up from 0.14.13 to 1.2.1, you may need to change a couple of things that changed since then e.g. this new version no longer uses auth tokens and uses API keys instead
j

Jan Bršťák

05/18/2022, 10:50 AM
Thank you for the reply. ECS tasks are not even being started (no log groups in Cloudwatch, nothing, really weird..). I’ve let it run for a while several times, and it just timed out (3 retries by lazarus, and end). I will take a look at these things that changed between those versions, maybe there is something we need to do.
a

Anna Geller

05/18/2022, 11:20 AM
yes, I think it may be that the flow run container cannot start due to authentication issues to Prefect Cloud, because the Auth method changed from tokens to API keys keep us posted how it goes 👍
👍 1
j

Jan Bršťák

05/18/2022, 1:20 PM
I’ve redeployed Agent, created new API key, new ECS Task, completely new deployment. And tasks are now starting 🙂 Thank you for the help.
a

Anna Geller

05/18/2022, 2:52 PM
wow, amazing work! so glad you figured it out, well done! 👏