https://prefect.io logo
d

David Wang

02/02/2022, 4:02 PM
Hi everyone, I am currently trying to setup an ECS agent with aws. However, the task seems to stop and exit with the error
essential container in task exited
. The prefect ecs agent also does not appear in the UI. I have been able to start the ECS agent locally, but am unable to start it through aws. Another thing to mention is that when I try to add the logConfiguration to try to see what could be going wrong with the service it will give me an error of
ResourceInitializationError: failed to validate logger args: : signal: killed
. I’ve double checked with devOps that the IAM roles and network configurations should be correct too. Any ideas on how to debug this or why this is happening?
k

Kevin Kho

02/02/2022, 4:05 PM
First time I have seen the
ResourceInitializationError
myself. Are you using the base Prefect image or your own? Are you just following the docs or do you have any added configuration?
d

David Wang

02/02/2022, 4:06 PM
only difference I made was not using the aws parameter store for now
k

Kevin Kho

02/02/2022, 4:15 PM
I think we need the logging to figure out the ECS issue and it might be a connectivity issue or the log group doesn’t exist so I would check for those?
d

David Wang

02/02/2022, 4:29 PM
I’ve confirmed that the log group exists. I did ask devOps about the connectivity earlier, but they said that I shouldnt be having trouble connecting to CloudWatch. I will try asking again about it though
k

Kevin Kho

02/02/2022, 4:31 PM
The most common cause for the container exiting though is image incompatibility.
d

David Wang

02/02/2022, 4:32 PM
current image that is specified in the container definitions is
"image": "prefecthq/prefect:latest-python3.8",
k

Kevin Kho

02/02/2022, 4:34 PM
Ah ok that should be good
d

David Wang

02/02/2022, 10:20 PM
I talked to devOps and they confirmed that connectivity is not the problem, as there would be a different error message. I also rechecked the log configuration but there doesn’t seem to be anything wrong.
Copy code
"logConfiguration": {
                "logDriver": "awslogs",
                "options": {
                    "awslogs-group": "$ECS_LOG_GROUP_NAME",
                    "awslogs-region": "$AWS_REGION",
                    "awslogs-stream-prefix": "ecs",
                    "awslogs-create-group": "true"
                }
            }
is there any other way to add the logs to the task definition like in the aws console?
k

Kevin Kho

02/02/2022, 10:23 PM
Not quite cuz the Task Definition tab just gives an error but no logs if you don’t setup CloudWatch. It won’t fix your logging issue, but the
task exited
can also be due to a lack of IAM permissions
Like not able to pull the image
d

David Wang

02/07/2022, 4:21 PM
I did once get a
CannotPullContainerError: inspect image has been retried 5 time(s): failed to resolve ref "<http://docker.io/prefecthq/prefect:latest-python3.8|docker.io/prefecthq/prefect:latest-python3.8>": failed to do request: Head <https://registry-1.docker.io/v2/prefecthq/prefect/manifests/latest-python3.8>: dial tcp ...
error while I was messing around with the security groups
I did also double check the IAM permissions and I should have the correct permissionss
a

Anna Geller

02/07/2022, 5:08 PM
it looks like this setting might be missing:
Copy code
assignPublicIp=ENABLED
You can set up as part of the network configuration:
Copy code
aws ecs create-service \
    --service-name $ECS_SERVICE_NAME\
    --task-definition $ECS_SERVICE_NAME:1 \
    --desired-count 1 \
    --launch-type FARGATE \
    --platform-version LATEST \
    --cluster $ECS_CLUSTER_NAME \
    --network-configuration awsvpcConfiguration="{subnets=[$SUBNET1, $SUBNET2, $SUBNET3],assignPublicIp=ENABLED}" --region $AWS_REGION
To explain it a bit more - my intuition (not 100% sure) is that your flow run container doesn't have access to the Internet in order to pull the container image from Dockerhub
d

David Wang

02/07/2022, 6:50 PM
I do have that setting enabled, however the VPC and subnets I’m using is not a default subnet, nor does it allow auto-assign public ipv4 or 6 addresses - would that affect it?
the VPC and subnet does have internet connection through a NAT gateway
a

Anna Geller

02/07/2022, 6:57 PM
that may be the issue indeed since you need to make sure that the CONTAINER within that instance also has access to the internet through that NAT gateway, You can test this out by launching a simple ECS task with a container getting the image from some public Dockerhub repo. Have you tried launching any ECS task within that container without Prefect? Did that work using the same network configuration?
d

David Wang

02/07/2022, 6:59 PM
I have not launched any task without prefect - is there any simple ecs task I can run to test this out?
a

Anna Geller

02/07/2022, 7:04 PM
there are a bunch of tutorials online, here are some you can use: •

https://www.youtube.com/watch?v=o_qSS4S1g34

https://www.youtube.com/watch?v=eq4wL2MiNqo

https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ECS_AWSCLI_Fargate.htmlhttps://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-cli-tutorial-ec2.html
d

David Wang

02/07/2022, 9:06 PM
just tested out launching an ecs task using the aws docs tutorial, and it was able to run with no problems
a

Anna Geller

02/08/2022, 2:43 PM
Did you run it in the same subnets as your Perfect flows? Did you assign public IPs? Did the setup for this differ in any way from your Prefect ECSRun?
d

David Wang

02/08/2022, 4:08 PM
Yes, I ran it in the same subnets as the prefect flows. I don’t think I assigned public IPs as I followed the part of the tutorial where the example was using a private subnet. I also reran the prefect flows with the exact same command as the tutorial besides the task definition name and it returned the
ResourceInitializationError: failed to validate logger args: : signal: killed
error again. And even if I remove the log configuration part to see if it will run it will give the
essential container in task exited
k

Kevin Kho

02/08/2022, 4:25 PM
Hey David, unfortunately I think there isn’t more advice we can give without really going into the AWS account and looking at the setup. We don’t offer that here in community support, but can always connect you to Prefect’s professional services team.
Or I dunno if Anna may have more ideas. She’s out sick though for now. Sorry about that.
d

David Wang

02/08/2022, 10:33 PM
I finally got it the logs to work - turns out I had to delete an existing log endpoint. DevOps had not tried this because he did not think that was the problem. So now the ECS agent shows up in the UI and is up and running. Only problem now is that when I try to run an example flow like this it gives me an error of
Parameter validation failed: Missing required parameter in networkConfiguration.awsvpcConfiguration: "subnets" Unknown parameter in networkConfiguration.awsvpcConfiguration: "Subnets", must be one of: subnets, securityGroups, assignPublicIp
. Do I need to specify network configurations again somewhere in the run config?
k

Kevin Kho

02/08/2022, 11:09 PM
Glad you got that figured out. So this error is because Prefect can infer the default VPC, but if you have custom ones, you need to specify them with a job template. It seems like your agent was able to start though? How did you start your agent in the right VPC?
d

David Wang

02/09/2022, 4:19 PM
i got it to run by adding the --run-task-kwargs command in the task definition like in this post
k

Kevin Kho

02/09/2022, 4:24 PM
So agent works but Flow still does not?
d

David Wang

02/09/2022, 4:24 PM
yes thats correct
k

Kevin Kho

02/09/2022, 4:31 PM
So I think there are two places to put this. One is on the flow like this where you add a
task_definition_path
. That links to here , and then you can specify those subnet and security group there. The ECS agent also takes a definition upon starting . You can pass
--task_definition_path
. You just need to make sure that these live somewhere the agent can pull during runtime (like an S3 bucket it has access to). The agent task_definition serves as a default for the Flows that it runs, but the RunConfig can override it. The default one the agent uses can be found here
d

David Wang

02/09/2022, 8:10 PM
what would be the correct format to specify the subnets and security groups? This is what I currently have but it says networkConfiguration parameter doesn’t exist.
Copy code
networkMode: awsvpc
cpu: 512
memory: 1024
containerDefinitions:
  - name: prefectEcsAgent
networkConfiguration:
  awsvpcConfiguration:
    Subnets:
      - subnet-xxx
    securityGroups:
      - sg-xxx
    assignPublicIp: DISABLED
k

Kevin Kho

02/09/2022, 8:21 PM
Let me look around. The quickest I found is this
Got my ECS set up. About to try this
Ok I think my understanding was wrong. Per the AWS ECS here , you need to specify it at run time instead of the task definition that gets registered. So custom VPCs need to be specified in task definition
run_task_kwargs
or on the agent
Copy code
If you specify the awsvpc network mode, the task is allocated an elastic network interface, and you must specify a NetworkConfiguration when you create a service or run a task with the task definition.
This is so painful I dunno why you can’t do it as part of the task definition
d

David Wang

02/09/2022, 10:17 PM
Yep it works after I specified the subnets and security group inside the run_task_kwargs argument in the runConfig for the flow. Interesting that I have to since I already specified the subnets/security groups with the run_task_kwargs parameter when starting the agent.
k

Kevin Kho

02/09/2022, 10:22 PM
The kwargs on the agent should propagate. Could you share your syntax?
d

David Wang

02/10/2022, 3:57 PM
Copy code
{
  "family": "$ECS_SERVICE_NAME",
  "requiresCompatibilities": [
    "FARGATE"
  ],
  "networkMode": "awsvpc",
  "cpu": "512",
  "memory": "1024",
  "taskRoleArn": "arn:aws:iam::xxx:role/prefectTaskRole",
  "executionRoleArn": "arn:aws:iam::xxx:role/prefectECSAgentTaskExecutionRole",
  "containerDefinitions": [
    {
      "name": "$ECS_SERVICE_NAME",
      "image": "prefecthq/prefect",
      "essential": true,
      "command": [
        "prefect",
        "agent",
        "ecs",
        "start",
        "--run-task-kwargs",
        "<s3://xxx-test-bucket/david/ecs-config.yaml>"
      ],
      "environment": [
        {
          "name": "PREFECT__CLOUD__API_KEY",
          "value": "xxx"
        },
        {
          "name": "PREFECT__CLOUD__AGENT__LABELS",
          "value": "['dev']"
        },
        {
          "name": "PREFECT__CLOUD__AGENT__LEVEL",
          "value": "INFO"
        },
        {
          "name": "PREFECT__CLOUD__API",
          "value": "<https://api.prefect.io>"
        }
      ],
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": "$ECS_LOG_GROUP_NAME",
          "awslogs-region": "$AWS_REGION",
          "awslogs-stream-prefix": "ecs",
          "awslogs-create-group": "true"
        }
      }
    }
  ]
}
and inside the yaml file
Copy code
networkConfiguration:
  awsvpcConfiguration:
    Subnets:
      - subnet-xxx
    securityGroups:
      - sg-xxx
    assignPublicIp: DISABLED
I also have the network configuration setup when creating the service with aws ecs create-service
Copy code
aws ecs create-service \
    --service-name $ECS_SERVICE_NAME\
    --task-definition $ECS_SERVICE_NAME:1 \
    --desired-count 1 \
    --launch-type FARGATE \
    --cluster $ECS_CLUSTER_NAME \
    --network-configuration awsvpcConfiguration="{subnets=[$SUBNET1],securityGroups=[$SECURITYGROUP]}" --region $AWS_REGION
To get the flow to run I just put in the networkConfiguration parameter in the RUN_CONFIG
Copy code
RUN_CONFIG = ECSRun(
    labels=["dev"],
    task_role_arn="arn:aws:iam::xxx:role/prefectTaskRole",
    run_task_kwargs=dict(cluster="prefectEcsCluster",networkConfiguration={
            'awsvpcConfiguration': {
                'subnets': [
                    'subnet-xxx'
                ],
                'securityGroups': [
                    'sg-xxx',
                ],
                'assignPublicIp': 'DISABLED'
            }
        }),
)
c

Christopher

02/11/2022, 6:53 AM
This might be unhelpful advice at this stage but I was reading through trying to get my own ECS set up. Your yaml file has "Subnets" with a capital S which I think is wrong. That what this error message is saying anyway: https://prefect-community.slack.com/archives/CL09KU1K7/p1644359607318099?thread_ts=1643817770.525689&amp;cid=CL09KU1K7
upvote 3
d

David Wang

02/11/2022, 4:28 PM
Looks like you were right! The flow now works normally without any extra added networkConfigurations needed.
🙌 1
173 Views