Abhishek

    Abhishek

    1 year ago
    Problem in running ecs agent: i am running an ecs agent as followed:
    prefect agent ecs  start --key $KEY --task-role-arn $TASK_ARN --log-level INFO --label s3_sync --name farget-dev --execution-role-arn $EXEC_ROLE_ARN
    and its giving me an error:
    ValueError: Failed to infer default networkConfiguration, please explicitly configure using `--run-task-kwargs`
    I am created a ecs cluster (using default). I used following command to use create cluster:
    aws ecs create-cluster
    which creates a default cluster. Can anyone point out if i am missing anything?
    a

    Anna Geller (old account)

    1 year ago
    It seems that you are missing the
    networkConfiguration
    . You could extract the subnet IDs this way:
    export AWS_REGION=set_your_region
    
    SUBNETS=$(aws ec2 describe-subnets --region $AWS_REGION)
    export SUBNET1=$(echo $SUBNETS | jq -r '.Subnets | .[0].SubnetId')
    export SUBNET2=$(echo $SUBNETS | jq -r '.Subnets | .[1].SubnetId')
    export SUBNET3=$(echo $SUBNETS | jq -r '.Subnets | .[2].SubnetId')
    And then you can set it as environment variable:
    export networkConfiguration="{'awsvpcConfiguration': {'assignPublicIp': 'ENABLED', 'subnets': ['$SUBNET1', '$SUBNET2', '$SUBNET3'], 'securityGroups': []}}"
    Specifying network configuration is important because otherwise Prefect will not know enough about the VPC to which it should deploy your flows.
    Abhishek

    Abhishek

    1 year ago
    @Anna Geller (old account) let me pass this. i tried with
    --run-task-kwargs task_def.yaml
    let me try with
    --network-configuration
    No luck.
    a

    Anna Geller (old account)

    1 year ago
    What error are you getting now? Are you running this as ECS service or on EC2?
    Abhishek

    Abhishek

    1 year ago
    on ec2
    ^^ @Anna Geller (old account)
    Kevin Kho

    Kevin Kho

    1 year ago
    run-task-kwargs
    should work. Does yours look like this ?
    Also I guess there is a chance you run into the same issue as Kyle
    a

    Anna Geller (old account)

    1 year ago
    @Abhishek if you want to create your agent as ECS service directly in your cluster instead of on EC2, I tried to automate this deployment via CLI. I couldn’t pass the AWS_ACCOUNT_ID as variable so you would have to search replace. I added some comments. @Kevin Kho let me know if I got something wrong 🙂 I followed the documentation https://gist.github.com/45c99852c22cc44fa260156b47339c0f
    Kevin Kho

    Kevin Kho

    1 year ago
    Did this work? I saw people before had to pass the key again
    And I’d doubt you’d get anything wrong cuz I reference your articles haha
    a

    Anna Geller (old account)

    1 year ago
    seems to work for me - great to hear 😄
    @Abhishek the above Gist and this Flow seem to be working - sending example Flow in case you want to try that:
    import prefect
    from prefect.storage import S3
    from prefect.run_configs import ECSRun
    from prefect import task, Flow
    
    
    @task
    def say_hi():
        logger = prefect.context.get("logger")
        <http://logger.info|logger.info>("Hi from Prefect %s", prefect.__version__)
    
    
    with Flow(
        "hello-from-ecs",
        storage=S3(bucket="prefect-datasets", key="flows/example_ecs_flow.py", stored_as_script=True),
        run_config=ECSRun(
            labels=["prod"],
            task_role_arn="arn:aws:iam::12345678:role/prefectTaskRole",
            run_task_kwargs=dict(cluster="prefectEcsCluster", launchType="FARGATE",),
        ),
    ) as flow:
        say_hi()
    
    if __name__ == "__main__":
        flow.register("01_Basics")
    Abhishek

    Abhishek

    1 year ago
    @Anna Geller (old account) thanks a lot, let me try the steps from gist. 🙏
    Hi @Anna Geller (old account), we tried the script you shared. It creates the cluster, tasks and roles. but same issue persists in task execution. Here is the screenshot of the error:
    Kevin Kho

    Kevin Kho

    1 year ago
    What do your RunConfig and YAML look like? Just omit sensitive info
    Abhishek

    Abhishek

    1 year ago
    i am just running the same .sh script that Anna shared. did not run the flow yet
    Kevin Kho

    Kevin Kho

    1 year ago
    What is your output of
    aws ec2 describe-subnets --region $AWS_REGION
    ?
    Abhishek

    Abhishek

    1 year ago
    we are using additional option in above command:
    aws ec2 describe-subnets --region $AWS_REGION --filters Name=vpc-id,Values=vpc-0xxxxx
    to filter with our default vpc.
    Kevin Kho

    Kevin Kho

    1 year ago
    Do you get the expected outputs when you do that command in the CLI? You can just hardcode the subnets maybe in the NetworkConfiguration portion
    Then also I guess if you can make sure that the cluster is indeed part of the VPC
    Abhishek

    Abhishek

    1 year ago
    Kevin Kho

    Kevin Kho

    1 year ago
    Yeah just make sure the cluster is in one of these subsets (or vpc), and then you can add these to the networkConfiguration when you
    create service
    a

    Anna Geller (old account)

    1 year ago
    @Abhishek thanks for trying this and sharing your logs. It looks like everything worked well except for the very last step when you create the ECS service. I agree with @Kevin Kho that you could manually look up the subnet IDs in the AWS management console and manually set the subnet IDs from there to be sure you are using the correct ones. Afaik, those should be public subnets, i.e. subnets that have internet gateway attached, could it be you are using private ones? In the image below is how you can look up the IDs in the management console. In the second image is how you can check whether your subnet has internet gateway attached. Also, it would be helpful if you can share your Prefect version. For comparison: I was using the latest version 0.15.5. Last thing that you could try is to run the entire setup in another region to check if this issue persists.
    Abhishek

    Abhishek

    1 year ago
    @Anna Geller (old account) we tried the same setup with terraform considering all the suggestions you mentioned (public subnet and proper roles etc) even the prefect version is same. but unfortunately same error is throwing in task execution.
    Kevin Kho

    Kevin Kho

    1 year ago
    Is that the default VPC or you have another one?
    Just to be clear, what edits did you make to Anna’s script? Just the filters?
    Abhishek

    Abhishek

    1 year ago
    We are not using the default VPC we didn’t make any change to the script except storing secrets in Secrets Manager  and used terraform modules to create roles, cluster, service and task definition.
    Kevin Kho

    Kevin Kho

    1 year ago
    Oh wait, you said task execution? So the agent is now running? Or you mean ECS task execution?
    Abhishek

    Abhishek

    1 year ago
    ECS agent is NOT running.
    Kevin Kho

    Kevin Kho

    1 year ago
    You cluster is backed by EC2 right? Are the EC2's in the appropriate subnet?
    Abhishek

    Abhishek

    1 year ago
    we are using fargat mode.
    @Kevin Kho https://github.com/PrefectHQ/prefect/blob/59d23b753bdf2927a1e6f35ff54809234eae8030/src/prefect/agent/ecs/agent.py#L251 This is inferring network configuration - but only for the default VPC - can’t pass a non default VPC.
    Kevin Kho

    Kevin Kho

    1 year ago
    It should not go there if you supply. Inferring happens can you don’t supply a networkConfiguration
    Are you using FARGATE as the launch type? and should you be using EC2? the inferring is called a couple of lines above . it should not be inferring since you supply
    Abhishek

    Abhishek

    1 year ago
    We are using FARGATE launch type.
    not passing any networkConfiguration explicitly. just followed the same script and steps as @Anna Geller (old account) shared earlier.
    Kevin Kho

    Kevin Kho

    1 year ago
    what happens when you do launch type EC2?
    change here and here
    Abhishek

    Abhishek

    1 year ago
    Okay. let us try.
    Kevin Kho

    Kevin Kho

    1 year ago
    Anna’s script creates a new cluster though that is not backed by EC2 i think. Did it create a new cluster for you?
    Abhishek

    Abhishek

    1 year ago
    Yes it does. It creates the cluster, service and task (to run an prefect agent)
    trying with ec2 launch type
    Kevin Kho

    Kevin Kho

    1 year ago
    what does your cluster say here? this should be the launch type
    and here:
    Abhishek

    Abhishek

    1 year ago
    FARGATE for both.
    Kevin Kho

    Kevin Kho

    1 year ago
    You can try adding the subnets in this section then
    Abhishek

    Abhishek

    1 year ago
    Okay.
    Kevin Kho

    Kevin Kho

    1 year ago
    Abhishek

    Abhishek

    1 year ago
    on running with EC2:
    service pm-dev-ecs-prefect was unable to place a task because no container instance met all of its requirements. Reason: No Container Instances were found in your cluster. For more information, see the Troubleshooting section.
    @Kevin Kho subnets are already passed in terraform via
    network_configuration {
          subnets          = var.subnet_ids
          assign_public_ip = true
          security_groups  = [aws_security_group.pm-ecs-prefect.id]
        }
    subnet_ids            = [data.terraform_remote_state.pm-dev-vpc.outputs.pm-dev-public-subnet-a,
                               data.terraform_remote_state.pm-dev-vpc.outputs.pm-dev-public-subnet-b,
                               data.terraform_remote_state.pm-dev-vpc.outputs.pm-dev-public-subnet-c]
    var.subnet.id
    does that.
    a

    Anna Geller (old account)

    1 year ago
    @Abhishek somehow CLI and terraform don’t seem to work with your VPC configuration. Here are some ideas that you may still try:1. I see that you added a security group - removing this is unlikely to solve the problem you are facing with VPC configuration, but I think you don’t necessary need it. I didn’t add any SG and I think you need it only if you would want to SSH to a container or if you would want to allow incoming traffic like using this container to host a website or API. 2. Would you give it a try to configure it all purely from the AWS management console (UI)? When you do that, the configuration wizard in the “Create ECS service from task definition” allow you to select subnets from a dropdown. This way you are making sure to only select those that are available for the service to use. 3. Could you give it a try using a default VPC instead of a custom one to see if in this case everything else works? This would let us see if the VPC or something else causes the problem. 4. I think I asked about that but I’m not sure whether you tried that: could you try configuring everything from scratch in a new AWS region and see whether this works? Sometimes if one has many resources in one region, it’s easy to confuse things and pick up the wrong subnets. I think that it should work the same way regardless of whether you choose Fargate or EC2 capacity provider because this is only a method of provisioning compute - provisioning containers and starting tasks/services should be the same.
    sharing in case somebody else comes across the same issue: @Manas Ranjan Kar figured it out - those were the steps:1. Command needed was this
    command: [
                    "prefect",
                    "agent",
                    "ecs",
                    "start",
                    "--run-task-kwargs",
                    "<s3://xyz/run_task_kwargs.yaml>"
                ],
    2. Custom VPC required custom YAML creation and upload to S3 for tasks to pick up - used 
    terraform
     for that - but was painful to get correct initially 3. Task definition needs 
    AWS_DEFAULT_REGION
     env var as well 4. 
    "'dev','etl'"
     works without the square brackets for 
    PREFECT__CLOUD__AGENT__LABELS
     env var 5. Creating subnets on the fly and creating associations for route tables via 
    terraform
     took some work 6. Few other networking fixes relevant to our environment
    j

    Jonathan Buys

    10 months ago
    I'm working through the same issue now. Can anyone share the syntax for a working
    run_task_kwargs.yaml
    file?
    Kevin Kho

    Kevin Kho

    10 months ago
    I have one here
    j

    Jonathan Buys

    10 months ago
    Perfect, thank you!
    Hummm… maybe I spoke too soon. Here's my
    run_task_kwargs.yaml
    file:
    executionRoleArn: arn:aws:iam::XXX:role/ecsTaskExecutionRole
    containerDefinitions:
    - memory: 1024
      memoryReservation: 512
      volumesFrom: []
      image: prefecthq/prefect
      essential: true
      name: flow
    placementConstraints: []
    memory: '4096'
    taskRoleArn: arn:aws:iam::XXX:role/prefect-ecs-task-role
    requiresCompatibilities:
    - FARGATE
    networkMode: awsvpc
    cpu: '1024'
    network_configuration:
      subnets: [subnet-XXX]
      assign_public_ip: true
      security_groups: [sg-XXX]
    But I'm still getting the
    Failed to infer default networkConfiguration
    error in ECS.
    Kevin Kho

    Kevin Kho

    10 months ago
    Is it
    network_configuration
    or
    networkConfiguration
    ? or did you try both?
    j

    Jonathan Buys

    10 months ago
    I didn't try camel case, I'll give that a shot.
    Progress! Now it's failing because it doesn't see the agent API token. I'll add that in as well. I think I can sort it out from here, thanks again.
    Kevin Kho

    Kevin Kho

    10 months ago
    Oh ok that sounds better
    j

    Jonathan Buys

    10 months ago
    Good grief, now the error is
    no such option: --key
    when running my command, defined as:
    "command": [
    "prefect",
    "agent",
    "ecs",
    "start",
    "--key",
    "ZZZZZZ",
    "--run-task-kwargs",
    "<s3://ts-codedeploy/run_task_kwargs.yaml>"
    In my task definition. Is the
    --key
    option a recent addition?
    I bet I need an image update.
    Kevin Kho

    Kevin Kho

    10 months ago
    --key is 0.15.0 and above
    j

    Jonathan Buys

    10 months ago
    It works! Got it, thank's again for your help!
    Kevin Kho

    Kevin Kho

    10 months ago
    Nice!
    j

    Jonathan Buys

    10 months ago
    Just a quick update, to actually get the flows to run to completion, this is the yaml file I needed on S3:
    networkConfiguration:
      awsvpcConfiguration:
        subnets: [subnet-XXXX]
        assignPublicIp: "ENABLED"
        securityGroups: [sg-XXXXXXXXXXX]
    John Shearer

    John Shearer

    9 months ago
    Thanks for this thread everyone. Helped me get our ECS prefect set up on AWS - on an old aws account without a default vpc.
    FYI. I've used Pulumi for the IaC, based on @Anna Geller’s tutorial -Thanks Anna. Would anyone more widely be interested in seeing the Pulumi implementation?
    Anna Geller

    Anna Geller

    9 months ago
    @John Shearer sharing your solution is always welcome! You never know how many people this might help in the future 👍
    Kevin Kho

    Kevin Kho

    9 months ago
    This would be a great blog or Github Discussion!
    Abhishek

    Abhishek

    9 months ago
    @John Shearer Yes. that would be great!