Problem in running ecs agent: i am running an ec...
# ask-community
a
Problem in running ecs agent: i am running an ecs agent as followed:
Copy code
prefect agent ecs  start --key $KEY --task-role-arn $TASK_ARN --log-level INFO --label s3_sync --name farget-dev --execution-role-arn $EXEC_ROLE_ARN
and its giving me an error:
Copy code
ValueError: Failed to infer default networkConfiguration, please explicitly configure using `--run-task-kwargs`
I am created a ecs cluster (using default). I used following command to use create cluster:
Copy code
aws ecs create-cluster
which creates a default cluster. Can anyone point out if i am missing anything?
a
It seems that you are missing the
networkConfiguration
. You could extract the subnet IDs this way:
Copy code
export AWS_REGION=set_your_region

SUBNETS=$(aws ec2 describe-subnets --region $AWS_REGION)
export SUBNET1=$(echo $SUBNETS | jq -r '.Subnets | .[0].SubnetId')
export SUBNET2=$(echo $SUBNETS | jq -r '.Subnets | .[1].SubnetId')
export SUBNET3=$(echo $SUBNETS | jq -r '.Subnets | .[2].SubnetId')
And then you can set it as environment variable:
Copy code
export networkConfiguration="{'awsvpcConfiguration': {'assignPublicIp': 'ENABLED', 'subnets': ['$SUBNET1', '$SUBNET2', '$SUBNET3'], 'securityGroups': []}}"
upvote 1
Specifying network configuration is important because otherwise Prefect will not know enough about the VPC to which it should deploy your flows.
a
@Anna Geller (old account) let me pass this. i tried with
Copy code
--run-task-kwargs task_def.yaml
let me try with
--network-configuration
No luck.
a
What error are you getting now? Are you running this as ECS service or on EC2?
a
on ec2
^^ @Anna Geller (old account)
k
run-task-kwargs
should work. Does yours look like this ?
Also I guess there is a chance you run into the same issue as Kyle
a
@Abhishek if you want to create your agent as ECS service directly in your cluster instead of on EC2, I tried to automate this deployment via CLI. I couldn’t pass the AWS_ACCOUNT_ID as variable so you would have to search replace. I added some comments. @Kevin Kho let me know if I got something wrong 🙂 I followed the documentation https://gist.github.com/45c99852c22cc44fa260156b47339c0f
k
Did this work? I saw people before had to pass the key again
And I’d doubt you’d get anything wrong cuz I reference your articles haha
a
seems to work for me - great to hear 😄
@Abhishek the above Gist and this Flow seem to be working - sending example Flow in case you want to try that:
Copy code
import prefect
from prefect.storage import S3
from prefect.run_configs import ECSRun
from prefect import task, Flow


@task
def say_hi():
    logger = prefect.context.get("logger")
    <http://logger.info|logger.info>("Hi from Prefect %s", prefect.__version__)


with Flow(
    "hello-from-ecs",
    storage=S3(bucket="prefect-datasets", key="flows/example_ecs_flow.py", stored_as_script=True),
    run_config=ECSRun(
        labels=["prod"],
        task_role_arn="arn:aws:iam::12345678:role/prefectTaskRole",
        run_task_kwargs=dict(cluster="prefectEcsCluster", launchType="FARGATE",),
    ),
) as flow:
    say_hi()

if __name__ == "__main__":
    flow.register("01_Basics")
a
@Anna Geller (old account) thanks a lot, let me try the steps from gist. 🙏
Hi @Anna Geller (old account), we tried the script you shared. It creates the cluster, tasks and roles. but same issue persists in task execution. Here is the screenshot of the error:
k
What do your RunConfig and YAML look like? Just omit sensitive info
a
i am just running the same .sh script that Anna shared. did not run the flow yet
k
What is your output of
aws ec2 describe-subnets --region $AWS_REGION
?
a
we are using additional option in above command:
Copy code
aws ec2 describe-subnets --region $AWS_REGION --filters Name=vpc-id,Values=vpc-0xxxxx
to filter with our default vpc.
k
Do you get the expected outputs when you do that command in the CLI? You can just hardcode the subnets maybe in the NetworkConfiguration portion
Then also I guess if you can make sure that the cluster is indeed part of the VPC
a
ec2-describe-subnet-op.json.js
k
Yeah just make sure the cluster is in one of these subsets (or vpc), and then you can add these to the networkConfiguration when you
create service
a
@Abhishek thanks for trying this and sharing your logs. It looks like everything worked well except for the very last step when you create the ECS service. I agree with @Kevin Kho that you could manually look up the subnet IDs in the AWS management console and manually set the subnet IDs from there to be sure you are using the correct ones. Afaik, those should be public subnets, i.e. subnets that have internet gateway attached, could it be you are using private ones? In the image below is how you can look up the IDs in the management console. In the second image is how you can check whether your subnet has internet gateway attached. Also, it would be helpful if you can share your Prefect version. For comparison: I was using the latest version 0.15.5. Last thing that you could try is to run the entire setup in another region to check if this issue persists.
upvote 1
a
@Anna Geller (old account) we tried the same setup with terraform considering all the suggestions you mentioned (public subnet and proper roles etc) even the prefect version is same. but unfortunately same error is throwing in task execution.
k
Is that the default VPC or you have another one?
Just to be clear, what edits did you make to Anna’s script? Just the filters?
a
We are not using the default VPC we didn’t make any change to the script except storing secrets in Secrets Manager  and used terraform modules to create roles, cluster, service and task definition.
k
Oh wait, you said task execution? So the agent is now running? Or you mean ECS task execution?
a
ECS agent is NOT running.
k
You cluster is backed by EC2 right? Are the EC2's in the appropriate subnet?
a
we are using fargat mode.
@Kevin Kho https://github.com/PrefectHQ/prefect/blob/59d23b753bdf2927a1e6f35ff54809234eae8030/src/prefect/agent/ecs/agent.py#L251 This is inferring network configuration - but only for the default VPC - can’t pass a non default VPC.
k
It should not go there if you supply. Inferring happens can you don’t supply a networkConfiguration
Are you using FARGATE as the launch type? and should you be using EC2? the inferring is called a couple of lines above . it should not be inferring since you supply
upvote 1
a
We are using FARGATE launch type.
not passing any networkConfiguration explicitly. just followed the same script and steps as @Anna Geller (old account) shared earlier.
k
what happens when you do launch type EC2?
change here and here
a
Okay. let us try.
k
Anna’s script creates a new cluster though that is not backed by EC2 i think. Did it create a new cluster for you?
a
Yes it does. It creates the cluster, service and task (to run an prefect agent)
trying with ec2 launch type
k
what does your cluster say here? this should be the launch type
and here:
a
FARGATE for both.
k
You can try adding the subnets in this section then
a
Okay.
k
Like this
👍 1
a
on running with EC2:
Copy code
service pm-dev-ecs-prefect was unable to place a task because no container instance met all of its requirements. Reason: No Container Instances were found in your cluster. For more information, see the Troubleshooting section.
@Kevin Kho subnets are already passed in terraform via
Copy code
network_configuration {
      subnets          = var.subnet_ids
      assign_public_ip = true
      security_groups  = [aws_security_group.pm-ecs-prefect.id]
    }
Copy code
subnet_ids            = [data.terraform_remote_state.pm-dev-vpc.outputs.pm-dev-public-subnet-a,
                           data.terraform_remote_state.pm-dev-vpc.outputs.pm-dev-public-subnet-b,
                           data.terraform_remote_state.pm-dev-vpc.outputs.pm-dev-public-subnet-c]
var.subnet.id
does that.
a
@Abhishek somehow CLI and terraform don’t seem to work with your VPC configuration. Here are some ideas that you may still try: 1. I see that you added a security group - removing this is unlikely to solve the problem you are facing with VPC configuration, but I think you don’t necessary need it. I didn’t add any SG and I think you need it only if you would want to SSH to a container or if you would want to allow incoming traffic like using this container to host a website or API. 2. Would you give it a try to configure it all purely from the AWS management console (UI)? When you do that, the configuration wizard in the “Create ECS service from task definition” allow you to select subnets from a dropdown. This way you are making sure to only select those that are available for the service to use. 3. Could you give it a try using a default VPC instead of a custom one to see if in this case everything else works? This would let us see if the VPC or something else causes the problem. 4. I think I asked about that but I’m not sure whether you tried that: could you try configuring everything from scratch in a new AWS region and see whether this works? Sometimes if one has many resources in one region, it’s easy to confuse things and pick up the wrong subnets. I think that it should work the same way regardless of whether you choose Fargate or EC2 capacity provider because this is only a method of provisioning compute - provisioning containers and starting tasks/services should be the same.
upvote 1
sharing in case somebody else comes across the same issue: @Manas Ranjan Kar figured it out - those were the steps: 1. Command needed was this
Copy code
command: [
                "prefect",
                "agent",
                "ecs",
                "start",
                "--run-task-kwargs",
                "<s3://xyz/run_task_kwargs.yaml>"
            ],
2. Custom VPC required custom YAML creation and upload to S3 for tasks to pick up - used 
terraform
 for that - but was painful to get correct initially 3. Task definition needs 
AWS_DEFAULT_REGION
 env var as well 4. 
"'dev','etl'"
 works without the square brackets for 
PREFECT__CLOUD__AGENT__LABELS
 env var 5. Creating subnets on the fly and creating associations for route tables via 
terraform
 took some work 6. Few other networking fixes relevant to our environment
2
j
I'm working through the same issue now. Can anyone share the syntax for a working
run_task_kwargs.yaml
file?
k
I have one here
🙌 1
j
Perfect, thank you!
Hummm… maybe I spoke too soon. Here's my
run_task_kwargs.yaml
file:
Copy code
executionRoleArn: arn:aws:iam::XXX:role/ecsTaskExecutionRole
containerDefinitions:
- memory: 1024
  memoryReservation: 512
  volumesFrom: []
  image: prefecthq/prefect
  essential: true
  name: flow
placementConstraints: []
memory: '4096'
taskRoleArn: arn:aws:iam::XXX:role/prefect-ecs-task-role
requiresCompatibilities:
- FARGATE
networkMode: awsvpc
cpu: '1024'
network_configuration:
  subnets: [subnet-XXX]
  assign_public_ip: true
  security_groups: [sg-XXX]
But I'm still getting the
Failed to infer default networkConfiguration
error in ECS.
k
Is it
network_configuration
or
networkConfiguration
? or did you try both?
j
I didn't try camel case, I'll give that a shot.
Progress! Now it's failing because it doesn't see the agent API token. I'll add that in as well. I think I can sort it out from here, thanks again.
k
Oh ok that sounds better
j
Good grief, now the error is
no such option: --key
when running my command, defined as:
Copy code
"command": [
"prefect",
"agent",
"ecs",
"start",
"--key",
"ZZZZZZ",
"--run-task-kwargs",
"<s3://ts-codedeploy/run_task_kwargs.yaml>"
In my task definition. Is the
--key
option a recent addition?
I bet I need an image update.
k
--key is 0.15.0 and above
2
j
It works! Got it, thank's again for your help!
k
Nice!
j
Just a quick update, to actually get the flows to run to completion, this is the yaml file I needed on S3:
Copy code
networkConfiguration:
  awsvpcConfiguration:
    subnets: [subnet-XXXX]
    assignPublicIp: "ENABLED"
    securityGroups: [sg-XXXXXXXXXXX]
👍 2
j
Thanks for this thread everyone. Helped me get our ECS prefect set up on AWS - on an old aws account without a default vpc.
FYI. I've used Pulumi for the IaC, based on @Anna Geller’s tutorial -Thanks Anna. Would anyone more widely be interested in seeing the Pulumi implementation?
a
@John Shearer sharing your solution is always welcome! You never know how many people this might help in the future 👍
🙂 1
k
This would be a great blog or Github Discussion!
👍 1
upvote 2
a
@John Shearer Yes. that would be great!