Hello, we use ECS to run our flows and lately I've...
# ask-community
c
Hello, we use ECS to run our flows and lately I've been noticing that the task definition is created and immediately becomes
[INACTIVE]
. It's particularly strange because the inactive task definition shows "none" for the task role, but there is a role attached. I'll add a screenshot in the thread. The flow finishes successfully and writes to an s3 bucket that requires the role permissions. Any ideas what could be happening?
k
Hey @Carter Kwon, this was a previously registered flow and it just happened recently? How do you set the task role?
c
This example is a previously registered flow, but it also happens on new flows. I'm not sure exactly when it started happening. I just started noticing it because if you don't look in the ECS console everything appears to be working normally. The flows are working as expected using permissions they could only have with those roles. We use terraform to create roles for each flow. The role ARNs are set as environment variables and picked up by the flow code during registration in the CI/CD pipeline like this
Copy code
flow.run_config = ECSRun(
    task_role_arn=os.getenv("TASK_ROLE_ARN"),
    execution_role_arn=os.getenv("EXECUTION_ROLE_ARN"),
)
The rest of the task definition settings are set on the agent using this terraform block
Copy code
resource "aws_ecs_task_definition" "agent_definition" {
  family                   = "etl-prefect-${var.prefect_cluster}-agent"
  task_role_arn            = aws_iam_role.agent_role.arn
  execution_role_arn       = aws_iam_role.agent_execution_role.arn
  network_mode             = "awsvpc"
  cpu                      = "256"
  memory                   = "512"
  requires_compatibilities = ["FARGATE"]
  container_definitions = jsonencode([
    {
      "image" : "prefecthq/prefect:latest",
      "name" : "agent",
      "logConfiguration" : {
        "logDriver" : "awslogs",
        "secretOptions" : null,
        "options" : {
          "awslogs-group" : "/ecs/etl-prefect-${var.prefect_cluster}-agent",
          "awslogs-region" : "us-west-2",
          "awslogs-stream-prefix" : "ecs"
        }
      },
      "entryPoint" : null,
      "portMappings" : [],
      "secrets" : [
        {
          "name" : "TOKEN",
          "valueFrom" : "${aws_ssm_parameter.ecs_token.arn}"
        }
      ],
      "entryPoint" = [
        "sh",
        "-c"
      ],
      "command" : [
        join(" ", ["sh", "-c",
          <<-COMMANDBLOCK
        'cat > run_task_kwargs.yaml <<EOF
        networkConfiguration:
            awsvpcConfiguration:
                assignPublicIp: DISABLED
                securityGroups:
                    - ${aws_security_group.agent_sg.id}
                subnets:
                    - ${tolist(data.aws_subnet_ids.private.ids)[0]}
                    - ${tolist(data.aws_subnet_ids.private.ids)[1]}
        EOF
        prefect agent ecs start --cluster ${aws_ecs_cluster.prefect_tasks.arn} --token $TOKEN --label ${var.prefect_cluster} --run-task-kwargs ./run_task_kwargs.yaml'
        COMMANDBLOCK
      ])]
    }
  ])
}
k
Do you see logs from the Inactive task?
c
In Prefect, yes.
k
I mean from this tab when you click into the Task on ECS
c
I'm not seeing a logs tab
k
How about here? But either way it's so weird that your task role is not showing up. Maybe the thing to try is configure an ecs agent on your local and deploy a flow run and see if the task role makes it to ECS?
c
We don't have task logs set up (it's on the to do list). We just rely on the Prefect logs. This is an example of an inactive definition
This is a normal one. I noticed there's additional env vars in the normal one. I wonder what would cause that?
I can't replicate the situation exactly locally because this is in a prod account where everyone has read-only. I'll try to see if I can replicate it in a sandbox account.
@Kevin Kho I created a similar setup in another AWS account and I'm still running into this issue. Any ideas what other steps I can take to troubleshoot? I'm pretty stumped.
A couple things to note that may be helpful: • It looks like this happened around the time I restarted our deployed ECS agent (running as an ECS service). That agent is pulling the
prefecthq/prefect:latest
image so it probably get a newer version of prefect. • Every time I click "run" in Prefect, a new task definition is created even though a new flow version wasn't registered • When a flow is running in ECS, the running task details page (not the task definition) shows the appropriate role attached. However, the task definition for that running task still shows "None" for the task role.
k
Hey what was your previous version and how did you update the service? I think this is authentication related. what is the new version?
c
That could make sense. I noticed this message when starting the agent locally and we use the same process on the deployed instance
Client was created with an API token configured for authentication. API tokens are deprecated, please use API keys instead.
I'll try using the new service account authentication and see if that changes things
k
Check this thread. Docs are currently a bit off.
c
I created a service account along with a key and started the agent with
--key <key>
and it got rid of the deprecated token warning, but not the [INACTIVE] task definition problem.
I also just upgraded prefect this morning so I'm on
0.15.3
k
I think there was a change with ECSRun to the re-register the task so you could change the RunConfig and have it apply. Do you not see any logs when you click into the task?
c
@Kevin Kho Hi Kevin, I've been looking back into this issue. Based on what you just said and this conversation here, I'm starting to think that the task definition immediately becoming [INACTIVE] is the new expected behavior. That would make sense especially since the flows seem to work as expected otherwise. I just wanted to confirm that the de-registration was a somewhat recent change and not a sign of an issue on our end.
k
I made an ECSFlow yesterday and saw the same behavior. Yes I think this is expected.
c
Thanks for all your time and help on this. That's good to hear. I was starting to think I was going crazy. I'm still curious why it doesn't show the task role as attached on the inactive task definition, but if it works it works 🤷‍♂️
k
That part…I just checked and see a role on mine lol
I set it both on the RunConfig and task-definition though. My example is here if you wanna compare?
c
Thanks for sharing that example. I just tested setting it on both the RunConfig and task-definition. The role stuck on the inactive definition as expected. It's strange that it makes a difference, but I'm glad to know that is the "issue". Aside from the role not sticking to the inactive task definition, do you see any problems using just the RunConfig as we currently are?
k
I’m not 100% that is the reason for it not appearing but I’m with you if it works, it works 🤷. I don’t foresee anything if it’s working.