David Wang

    David Wang

    7 months ago
    Hi everyone, I am currently trying to setup an ECS agent with aws. However, the task seems to stop and exit with the error
    essential container in task exited
    . The prefect ecs agent also does not appear in the UI. I have been able to start the ECS agent locally, but am unable to start it through aws. Another thing to mention is that when I try to add the logConfiguration to try to see what could be going wrong with the service it will give me an error of
    ResourceInitializationError: failed to validate logger args: : signal: killed
    . I’ve double checked with devOps that the IAM roles and network configurations should be correct too. Any ideas on how to debug this or why this is happening?
    Kevin Kho

    Kevin Kho

    7 months ago
    First time I have seen the
    ResourceInitializationError
    myself. Are you using the base Prefect image or your own? Are you just following the docs or do you have any added configuration?
    David Wang

    David Wang

    7 months ago
    only difference I made was not using the aws parameter store for now
    Kevin Kho

    Kevin Kho

    7 months ago
    I think we need the logging to figure out the ECS issue and it might be a connectivity issue or the log group doesn’t exist so I would check for those?
    David Wang

    David Wang

    7 months ago
    I’ve confirmed that the log group exists. I did ask devOps about the connectivity earlier, but they said that I shouldnt be having trouble connecting to CloudWatch. I will try asking again about it though
    Kevin Kho

    Kevin Kho

    7 months ago
    The most common cause for the container exiting though is image incompatibility.
    David Wang

    David Wang

    7 months ago
    current image that is specified in the container definitions is
    "image": "prefecthq/prefect:latest-python3.8",
    Kevin Kho

    Kevin Kho

    7 months ago
    Ah ok that should be good
    David Wang

    David Wang

    7 months ago
    I talked to devOps and they confirmed that connectivity is not the problem, as there would be a different error message. I also rechecked the log configuration but there doesn’t seem to be anything wrong.
    "logConfiguration": {
                    "logDriver": "awslogs",
                    "options": {
                        "awslogs-group": "$ECS_LOG_GROUP_NAME",
                        "awslogs-region": "$AWS_REGION",
                        "awslogs-stream-prefix": "ecs",
                        "awslogs-create-group": "true"
                    }
                }
    is there any other way to add the logs to the task definition like in the aws console?
    Kevin Kho

    Kevin Kho

    7 months ago
    Not quite cuz the Task Definition tab just gives an error but no logs if you don’t setup CloudWatch. It won’t fix your logging issue, but the
    task exited
    can also be due to a lack of IAM permissions
    Like not able to pull the image
    David Wang

    David Wang

    7 months ago
    I did once get a
    CannotPullContainerError: inspect image has been retried 5 time(s): failed to resolve ref "<http://docker.io/prefecthq/prefect:latest-python3.8|docker.io/prefecthq/prefect:latest-python3.8>": failed to do request: Head <https://registry-1.docker.io/v2/prefecthq/prefect/manifests/latest-python3.8>: dial tcp ...
    error while I was messing around with the security groups
    I did also double check the IAM permissions and I should have the correct permissionss
    Anna Geller

    Anna Geller

    7 months ago
    it looks like this setting might be missing:
    assignPublicIp=ENABLED
    You can set up as part of the network configuration:
    aws ecs create-service \
        --service-name $ECS_SERVICE_NAME\
        --task-definition $ECS_SERVICE_NAME:1 \
        --desired-count 1 \
        --launch-type FARGATE \
        --platform-version LATEST \
        --cluster $ECS_CLUSTER_NAME \
        --network-configuration awsvpcConfiguration="{subnets=[$SUBNET1, $SUBNET2, $SUBNET3],assignPublicIp=ENABLED}" --region $AWS_REGION
    To explain it a bit more - my intuition (not 100% sure) is that your flow run container doesn't have access to the Internet in order to pull the container image from Dockerhub
    David Wang

    David Wang

    7 months ago
    I do have that setting enabled, however the VPC and subnets I’m using is not a default subnet, nor does it allow auto-assign public ipv4 or 6 addresses - would that affect it?
    the VPC and subnet does have internet connection through a NAT gateway
    Anna Geller

    Anna Geller

    7 months ago
    that may be the issue indeed since you need to make sure that the CONTAINER within that instance also has access to the internet through that NAT gateway, You can test this out by launching a simple ECS task with a container getting the image from some public Dockerhub repo. Have you tried launching any ECS task within that container without Prefect? Did that work using the same network configuration?
    David Wang

    David Wang

    7 months ago
    I have not launched any task without prefect - is there any simple ecs task I can run to test this out?
    Anna Geller

    Anna Geller

    7 months ago
    there are a bunch of tutorials online, here are some you can use: •

    https://www.youtube.com/watch?v=o_qSS4S1g34

    https://www.youtube.com/watch?v=eq4wL2MiNqo

    https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ECS_AWSCLI_Fargate.html https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-cli-tutorial-ec2.html
    David Wang

    David Wang

    7 months ago
    just tested out launching an ecs task using the aws docs tutorial, and it was able to run with no problems
    Anna Geller

    Anna Geller

    7 months ago
    Did you run it in the same subnets as your Perfect flows? Did you assign public IPs? Did the setup for this differ in any way from your Prefect ECSRun?
    David Wang

    David Wang

    7 months ago
    Yes, I ran it in the same subnets as the prefect flows. I don’t think I assigned public IPs as I followed the part of the tutorial where the example was using a private subnet. I also reran the prefect flows with the exact same command as the tutorial besides the task definition name and it returned the
    ResourceInitializationError: failed to validate logger args: : signal: killed
    error again. And even if I remove the log configuration part to see if it will run it will give the
    essential container in task exited
    Kevin Kho

    Kevin Kho

    7 months ago
    Hey David, unfortunately I think there isn’t more advice we can give without really going into the AWS account and looking at the setup. We don’t offer that here in community support, but can always connect you to Prefect’s professional services team.
    Or I dunno if Anna may have more ideas. She’s out sick though for now. Sorry about that.
    David Wang

    David Wang

    7 months ago
    I finally got it the logs to work - turns out I had to delete an existing log endpoint. DevOps had not tried this because he did not think that was the problem. So now the ECS agent shows up in the UI and is up and running. Only problem now is that when I try to run an example flow like this it gives me an error of
    Parameter validation failed: Missing required parameter in networkConfiguration.awsvpcConfiguration: "subnets" Unknown parameter in networkConfiguration.awsvpcConfiguration: "Subnets", must be one of: subnets, securityGroups, assignPublicIp
    . Do I need to specify network configurations again somewhere in the run config?
    Kevin Kho

    Kevin Kho

    7 months ago
    Glad you got that figured out. So this error is because Prefect can infer the default VPC, but if you have custom ones, you need to specify them with a job template. It seems like your agent was able to start though? How did you start your agent in the right VPC?
    David Wang

    David Wang

    7 months ago
    i got it to run by adding the --run-task-kwargs command in the task definition like in this post
    Kevin Kho

    Kevin Kho

    7 months ago
    So agent works but Flow still does not?
    David Wang

    David Wang

    7 months ago
    yes thats correct
    Kevin Kho

    Kevin Kho

    7 months ago
    So I think there are two places to put this. One is on the flow like this where you add a
    task_definition_path
    . That links to here , and then you can specify those subnet and security group there. The ECS agent also takes a definition upon starting . You can pass
    --task_definition_path
    . You just need to make sure that these live somewhere the agent can pull during runtime (like an S3 bucket it has access to). The agent task_definition serves as a default for the Flows that it runs, but the RunConfig can override it. The default one the agent uses can be found here
    David Wang

    David Wang

    7 months ago
    what would be the correct format to specify the subnets and security groups? This is what I currently have but it says networkConfiguration parameter doesn’t exist.
    networkMode: awsvpc
    cpu: 512
    memory: 1024
    containerDefinitions:
      - name: prefectEcsAgent
    networkConfiguration:
      awsvpcConfiguration:
        Subnets:
          - subnet-xxx
        securityGroups:
          - sg-xxx
        assignPublicIp: DISABLED
    Kevin Kho

    Kevin Kho

    7 months ago
    Let me look around. The quickest I found is this
    Got my ECS set up. About to try this
    Ok I think my understanding was wrong. Per the AWS ECS here , you need to specify it at run time instead of the task definition that gets registered. So custom VPCs need to be specified in task definition
    run_task_kwargs
    or on the agent
    If you specify the awsvpc network mode, the task is allocated an elastic network interface, and you must specify a NetworkConfiguration when you create a service or run a task with the task definition.
    This is so painful I dunno why you can’t do it as part of the task definition
    David Wang

    David Wang

    7 months ago
    Yep it works after I specified the subnets and security group inside the run_task_kwargs argument in the runConfig for the flow. Interesting that I have to since I already specified the subnets/security groups with the run_task_kwargs parameter when starting the agent.
    Kevin Kho

    Kevin Kho

    7 months ago
    The kwargs on the agent should propagate. Could you share your syntax?
    David Wang

    David Wang

    7 months ago
    {
      "family": "$ECS_SERVICE_NAME",
      "requiresCompatibilities": [
        "FARGATE"
      ],
      "networkMode": "awsvpc",
      "cpu": "512",
      "memory": "1024",
      "taskRoleArn": "arn:aws:iam::xxx:role/prefectTaskRole",
      "executionRoleArn": "arn:aws:iam::xxx:role/prefectECSAgentTaskExecutionRole",
      "containerDefinitions": [
        {
          "name": "$ECS_SERVICE_NAME",
          "image": "prefecthq/prefect",
          "essential": true,
          "command": [
            "prefect",
            "agent",
            "ecs",
            "start",
            "--run-task-kwargs",
            "<s3://xxx-test-bucket/david/ecs-config.yaml>"
          ],
          "environment": [
            {
              "name": "PREFECT__CLOUD__API_KEY",
              "value": "xxx"
            },
            {
              "name": "PREFECT__CLOUD__AGENT__LABELS",
              "value": "['dev']"
            },
            {
              "name": "PREFECT__CLOUD__AGENT__LEVEL",
              "value": "INFO"
            },
            {
              "name": "PREFECT__CLOUD__API",
              "value": "<https://api.prefect.io>"
            }
          ],
          "logConfiguration": {
            "logDriver": "awslogs",
            "options": {
              "awslogs-group": "$ECS_LOG_GROUP_NAME",
              "awslogs-region": "$AWS_REGION",
              "awslogs-stream-prefix": "ecs",
              "awslogs-create-group": "true"
            }
          }
        }
      ]
    }
    and inside the yaml file
    networkConfiguration:
      awsvpcConfiguration:
        Subnets:
          - subnet-xxx
        securityGroups:
          - sg-xxx
        assignPublicIp: DISABLED
    I also have the network configuration setup when creating the service with aws ecs create-service
    aws ecs create-service \
        --service-name $ECS_SERVICE_NAME\
        --task-definition $ECS_SERVICE_NAME:1 \
        --desired-count 1 \
        --launch-type FARGATE \
        --cluster $ECS_CLUSTER_NAME \
        --network-configuration awsvpcConfiguration="{subnets=[$SUBNET1],securityGroups=[$SECURITYGROUP]}" --region $AWS_REGION
    To get the flow to run I just put in the networkConfiguration parameter in the RUN_CONFIG
    RUN_CONFIG = ECSRun(
        labels=["dev"],
        task_role_arn="arn:aws:iam::xxx:role/prefectTaskRole",
        run_task_kwargs=dict(cluster="prefectEcsCluster",networkConfiguration={
                'awsvpcConfiguration': {
                    'subnets': [
                        'subnet-xxx'
                    ],
                    'securityGroups': [
                        'sg-xxx',
                    ],
                    'assignPublicIp': 'DISABLED'
                }
            }),
    )
    c

    Christopher

    7 months ago
    This might be unhelpful advice at this stage but I was reading through trying to get my own ECS set up. Your yaml file has "Subnets" with a capital S which I think is wrong. That what this error message is saying anyway: https://prefect-community.slack.com/archives/CL09KU1K7/p1644359607318099?thread_ts=1643817770.525689&amp;cid=CL09KU1K7
    David Wang

    David Wang

    7 months ago
    Looks like you were right! The flow now works normally without any extra added networkConfigurations needed.