Hi everyone, I am currently trying to setup an EC...
# prefect-community
d
Hi everyone, I am currently trying to setup an ECS agent with aws. However, the task seems to stop and exit with the error
essential container in task exited
. The prefect ecs agent also does not appear in the UI. I have been able to start the ECS agent locally, but am unable to start it through aws. Another thing to mention is that when I try to add the logConfiguration to try to see what could be going wrong with the service it will give me an error of
ResourceInitializationError: failed to validate logger args: : signal: killed
. I’ve double checked with devOps that the IAM roles and network configurations should be correct too. Any ideas on how to debug this or why this is happening?
k
First time I have seen the
ResourceInitializationError
myself. Are you using the base Prefect image or your own? Are you just following the docs or do you have any added configuration?
d
only difference I made was not using the aws parameter store for now
k
I think we need the logging to figure out the ECS issue and it might be a connectivity issue or the log group doesn’t exist so I would check for those?
d
I’ve confirmed that the log group exists. I did ask devOps about the connectivity earlier, but they said that I shouldnt be having trouble connecting to CloudWatch. I will try asking again about it though
k
The most common cause for the container exiting though is image incompatibility.
d
current image that is specified in the container definitions is
"image": "prefecthq/prefect:latest-python3.8",
k
Ah ok that should be good
d
I talked to devOps and they confirmed that connectivity is not the problem, as there would be a different error message. I also rechecked the log configuration but there doesn’t seem to be anything wrong.
Copy code
"logConfiguration": {
                "logDriver": "awslogs",
                "options": {
                    "awslogs-group": "$ECS_LOG_GROUP_NAME",
                    "awslogs-region": "$AWS_REGION",
                    "awslogs-stream-prefix": "ecs",
                    "awslogs-create-group": "true"
                }
            }
is there any other way to add the logs to the task definition like in the aws console?
k
Not quite cuz the Task Definition tab just gives an error but no logs if you don’t setup CloudWatch. It won’t fix your logging issue, but the
task exited
can also be due to a lack of IAM permissions
Like not able to pull the image
d
I did once get a
CannotPullContainerError: inspect image has been retried 5 time(s): failed to resolve ref "<http://docker.io/prefecthq/prefect:latest-python3.8|docker.io/prefecthq/prefect:latest-python3.8>": failed to do request: Head <https://registry-1.docker.io/v2/prefecthq/prefect/manifests/latest-python3.8>: dial tcp ...
error while I was messing around with the security groups
I did also double check the IAM permissions and I should have the correct permissionss
a
it looks like this setting might be missing:
Copy code
assignPublicIp=ENABLED
You can set up as part of the network configuration:
Copy code
aws ecs create-service \
    --service-name $ECS_SERVICE_NAME\
    --task-definition $ECS_SERVICE_NAME:1 \
    --desired-count 1 \
    --launch-type FARGATE \
    --platform-version LATEST \
    --cluster $ECS_CLUSTER_NAME \
    --network-configuration awsvpcConfiguration="{subnets=[$SUBNET1, $SUBNET2, $SUBNET3],assignPublicIp=ENABLED}" --region $AWS_REGION
To explain it a bit more - my intuition (not 100% sure) is that your flow run container doesn't have access to the Internet in order to pull the container image from Dockerhub
d
I do have that setting enabled, however the VPC and subnets I’m using is not a default subnet, nor does it allow auto-assign public ipv4 or 6 addresses - would that affect it?
the VPC and subnet does have internet connection through a NAT gateway
a
that may be the issue indeed since you need to make sure that the CONTAINER within that instance also has access to the internet through that NAT gateway, You can test this out by launching a simple ECS task with a container getting the image from some public Dockerhub repo. Have you tried launching any ECS task within that container without Prefect? Did that work using the same network configuration?
d
I have not launched any task without prefect - is there any simple ecs task I can run to test this out?
a
there are a bunch of tutorials online, here are some you can use: •

https://www.youtube.com/watch?v=o_qSS4S1g34

https://www.youtube.com/watch?v=eq4wL2MiNqo

https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ECS_AWSCLI_Fargate.htmlhttps://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-cli-tutorial-ec2.html
d
just tested out launching an ecs task using the aws docs tutorial, and it was able to run with no problems
a
Did you run it in the same subnets as your Perfect flows? Did you assign public IPs? Did the setup for this differ in any way from your Prefect ECSRun?
d
Yes, I ran it in the same subnets as the prefect flows. I don’t think I assigned public IPs as I followed the part of the tutorial where the example was using a private subnet. I also reran the prefect flows with the exact same command as the tutorial besides the task definition name and it returned the
ResourceInitializationError: failed to validate logger args: : signal: killed
error again. And even if I remove the log configuration part to see if it will run it will give the
essential container in task exited
k
Hey David, unfortunately I think there isn’t more advice we can give without really going into the AWS account and looking at the setup. We don’t offer that here in community support, but can always connect you to Prefect’s professional services team.
Or I dunno if Anna may have more ideas. She’s out sick though for now. Sorry about that.
d
I finally got it the logs to work - turns out I had to delete an existing log endpoint. DevOps had not tried this because he did not think that was the problem. So now the ECS agent shows up in the UI and is up and running. Only problem now is that when I try to run an example flow like this it gives me an error of
Parameter validation failed: Missing required parameter in networkConfiguration.awsvpcConfiguration: "subnets" Unknown parameter in networkConfiguration.awsvpcConfiguration: "Subnets", must be one of: subnets, securityGroups, assignPublicIp
. Do I need to specify network configurations again somewhere in the run config?
k
Glad you got that figured out. So this error is because Prefect can infer the default VPC, but if you have custom ones, you need to specify them with a job template. It seems like your agent was able to start though? How did you start your agent in the right VPC?
d
i got it to run by adding the --run-task-kwargs command in the task definition like in this post
k
So agent works but Flow still does not?
d
yes thats correct
k
So I think there are two places to put this. One is on the flow like this where you add a
task_definition_path
. That links to here , and then you can specify those subnet and security group there. The ECS agent also takes a definition upon starting . You can pass
--task_definition_path
. You just need to make sure that these live somewhere the agent can pull during runtime (like an S3 bucket it has access to). The agent task_definition serves as a default for the Flows that it runs, but the RunConfig can override it. The default one the agent uses can be found here
d
what would be the correct format to specify the subnets and security groups? This is what I currently have but it says networkConfiguration parameter doesn’t exist.
Copy code
networkMode: awsvpc
cpu: 512
memory: 1024
containerDefinitions:
  - name: prefectEcsAgent
networkConfiguration:
  awsvpcConfiguration:
    Subnets:
      - subnet-xxx
    securityGroups:
      - sg-xxx
    assignPublicIp: DISABLED
k
Let me look around. The quickest I found is this
Got my ECS set up. About to try this
Ok I think my understanding was wrong. Per the AWS ECS here , you need to specify it at run time instead of the task definition that gets registered. So custom VPCs need to be specified in task definition
run_task_kwargs
or on the agent
Copy code
If you specify the awsvpc network mode, the task is allocated an elastic network interface, and you must specify a NetworkConfiguration when you create a service or run a task with the task definition.
This is so painful I dunno why you can’t do it as part of the task definition
d
Yep it works after I specified the subnets and security group inside the run_task_kwargs argument in the runConfig for the flow. Interesting that I have to since I already specified the subnets/security groups with the run_task_kwargs parameter when starting the agent.
k
The kwargs on the agent should propagate. Could you share your syntax?
d
Copy code
{
  "family": "$ECS_SERVICE_NAME",
  "requiresCompatibilities": [
    "FARGATE"
  ],
  "networkMode": "awsvpc",
  "cpu": "512",
  "memory": "1024",
  "taskRoleArn": "arn:aws:iam::xxx:role/prefectTaskRole",
  "executionRoleArn": "arn:aws:iam::xxx:role/prefectECSAgentTaskExecutionRole",
  "containerDefinitions": [
    {
      "name": "$ECS_SERVICE_NAME",
      "image": "prefecthq/prefect",
      "essential": true,
      "command": [
        "prefect",
        "agent",
        "ecs",
        "start",
        "--run-task-kwargs",
        "<s3://xxx-test-bucket/david/ecs-config.yaml>"
      ],
      "environment": [
        {
          "name": "PREFECT__CLOUD__API_KEY",
          "value": "xxx"
        },
        {
          "name": "PREFECT__CLOUD__AGENT__LABELS",
          "value": "['dev']"
        },
        {
          "name": "PREFECT__CLOUD__AGENT__LEVEL",
          "value": "INFO"
        },
        {
          "name": "PREFECT__CLOUD__API",
          "value": "<https://api.prefect.io>"
        }
      ],
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": "$ECS_LOG_GROUP_NAME",
          "awslogs-region": "$AWS_REGION",
          "awslogs-stream-prefix": "ecs",
          "awslogs-create-group": "true"
        }
      }
    }
  ]
}
and inside the yaml file
Copy code
networkConfiguration:
  awsvpcConfiguration:
    Subnets:
      - subnet-xxx
    securityGroups:
      - sg-xxx
    assignPublicIp: DISABLED
I also have the network configuration setup when creating the service with aws ecs create-service
Copy code
aws ecs create-service \
    --service-name $ECS_SERVICE_NAME\
    --task-definition $ECS_SERVICE_NAME:1 \
    --desired-count 1 \
    --launch-type FARGATE \
    --cluster $ECS_CLUSTER_NAME \
    --network-configuration awsvpcConfiguration="{subnets=[$SUBNET1],securityGroups=[$SECURITYGROUP]}" --region $AWS_REGION
To get the flow to run I just put in the networkConfiguration parameter in the RUN_CONFIG
Copy code
RUN_CONFIG = ECSRun(
    labels=["dev"],
    task_role_arn="arn:aws:iam::xxx:role/prefectTaskRole",
    run_task_kwargs=dict(cluster="prefectEcsCluster",networkConfiguration={
            'awsvpcConfiguration': {
                'subnets': [
                    'subnet-xxx'
                ],
                'securityGroups': [
                    'sg-xxx',
                ],
                'assignPublicIp': 'DISABLED'
            }
        }),
)
c
This might be unhelpful advice at this stage but I was reading through trying to get my own ECS set up. Your yaml file has "Subnets" with a capital S which I think is wrong. That what this error message is saying anyway: https://prefect-community.slack.com/archives/CL09KU1K7/p1644359607318099?thread_ts=1643817770.525689&amp;cid=CL09KU1K7
upvote 3
d
Looks like you were right! The flow now works normally without any extra added networkConfigurations needed.
🙌 1
673 Views