Hi all I am trying to run a Prefect agent on ECS and have be Prefect Community #ask-community

Hi all, I am trying to run a Prefect agent on ECS ...

Alex Welch

02/16/2021, 2:17 PM

Hi all, I am trying to run a Prefect agent on ECS and have been following this tutorial. I used the below configurations (along with exporting the runner token, aws creds, etc to environment variables). However I am continuously receiving a

ValueError: Failed to infer default networkConfiguration, please explicitly configure using --run-task-kwargs

error. I have tried all teh different combinations I can think of and been through the docs (both prefect and ECS) but cant find a solution that works. Has anyone else been able to solve this?

Copy code

export networkConfiguration="{'awsvpcConfiguration': {'assignPublicIp': 'ENABLED', 'subnets': ['<EC2 instance subnet>'], 'securityGroups': ['<EC2 instance security group>']}}"
prefect agent ecs start --token $RUNNER_TOKEN_FARGATE_AGENT \
 --task-role-arn=arn:aws:iam::<AWS ACCOUNT ID>:role/ECSTaskS3Role \
 --log-level INFO --label prod --label s3-flow-storage \
 --name prefect-prod

Alex Welch

02/16/2021, 2:21 PM

when i run

--log-level DEBUG

it looks like nothing is being passed to the environment variables. I’m thinking this is the problem?

Alex Welch

02/16/2021, 2:23 PM

but when i add in the

networkConfiguration

as an

--env

I get

Error: Got unexpected extra arguments

ale

02/16/2021, 2:30 PM

Hey @Alex Welch From the docs it seems that if you want to provide

networkConfiguration

you have to provide it in a YAML file. Something like this

Copy code

prefect agent ecs start --run-task-kwargs /path/to/options.yaml

ale

02/16/2021, 2:34 PM

Copy code

prefect agent ecs start --run-task-kwargs <s3://bucket/path/to/options.yaml>

If you want to provide the YAML file from S3

Alex Welch

02/16/2021, 3:18 PM

i noticed that, but i didnt see an example of what the yaml needed to look like

Alex Welch

02/16/2021, 3:18 PM

i found some docs on the available environment variables in prefect and netwrokConfiguration wasn’t one of them

Alex Welch

02/16/2021, 3:18 PM

https://github.com/PrefectHQ/prefect/blob/master/src/prefect/config.toml

ale

02/16/2021, 3:25 PM

This might helpm you https://github.com/PrefectHQ/prefect/blob/18fcb5dfce7ba3ecf9db3c6a6f0efa21d9403d88/tests/agent/test_ecs_agent.py#L244 Maybe someone from Prefect Team can point you to the

kwargs.yaml

, I’m not able to find it 😅

Alex Welch

02/16/2021, 3:38 PM

thanks, ill give it a try

👍 1

Alex Welch

02/16/2021, 3:44 PM

looks like that solved the networking part… i think? I now get

Failed to load and execute Flow's environment: UnpicklingError("invalid load key, '{'.")

when i run my flow

ale

02/16/2021, 3:45 PM

Would you mind sharing the YAML content? I’m not sure this is related to newtorkConfiguration, though

Alex Welch

02/16/2021, 3:48 PM

Alex Welch

02/16/2021, 3:52 PM

based on some earlier threads, I did validate that both my flow and the ecs prefect versiosn were using 0.14.8

Alex Welch

02/16/2021, 3:53 PM

and that cloudpickle is running at 1.6.0

ale

02/16/2021, 3:58 PM

Can you share your logs? Maybe setting logs to debug (if not already on debug level) would help in troubleshooting

Alex Welch

02/16/2021, 3:59 PM

Alex Welch

02/16/2021, 4:00 PM

that is what I have

Alex Welch

02/16/2021, 4:00 PM

in the actual agent it looks liek

Alex Welch

02/16/2021, 4:00 PM

ale

02/16/2021, 4:02 PM

I suggest you restart the agent with the following options:

--show-flow-logs --verbose

This way you can see the flow run logs in the agent console

Alex Welch

02/16/2021, 4:04 PM

Error: no such option: --show-flow-logs

Alex Welch

02/16/2021, 4:05 PM

this is what is available

ale

02/16/2021, 4:06 PM

Try setting log level to debug

ale

02/16/2021, 4:06 PM

I also suggest you try to call this function

prefect.utilities.debug.is_serializable

on your flow https://docs.prefect.io/api/latest/utilities/debug.html#functions

Alex Welch

02/16/2021, 4:09 PM

so i have log level set to debug\

Alex Welch

02/16/2021, 4:09 PM

give me one second to dig into that doc

Alex Welch

02/16/2021, 4:11 PM

here is my flow

Alex Welch

02/16/2021, 4:11 PM

Alex Welch

02/16/2021, 4:12 PM

oh! it worked

Alex Welch

02/16/2021, 4:13 PM

but…. why lol

ale

02/16/2021, 4:13 PM

😅

Alex Welch

02/16/2021, 4:13 PM

well thats amazing

ale

02/16/2021, 4:14 PM

Well, we don’t know how we did it….but we did it!

Jim Crist-Harif

02/16/2021, 5:00 PM

Failed to load and execute Flow's environment: UnpicklingError("invalid load key, '{'.")

This error indicates that the image used to run the flow was older than prefect 0.14.3, while the version of prefect used to register the flow was >= 0.14.3. In 0.14.3 we changed the serialization format to include some metadata like version numbers, so in the future if this happens you'll get a nice error message. We've never made guarantees that running a flow using a version of prefect older than the version the flow was registered with would work, so this wasn't really a breaking change, but the error that happens if you've accidentally been doing this is a bit cryptic.

Jim Crist-Harif

02/16/2021, 5:05 PM

Regarding the

networkConfiguration

thing, specifying that via a yaml file provided to

--run-task-kwargs

is the correct way to fix it. Is there something we could do to make this clearer? The error message you got specifically calls out using

--run-task-kwargs

to fix it, and the behavior of this flag is documented in both the CLI help and the docs: https://docs.prefect.io/orchestration/agents/ecs.html#custom-runtime-options. Happy to make any docs updates needed to help future users who may run into this issue.

ale

02/17/2021, 8:46 AM

Hey @Jim Crist-Harif 🙂 Thanks a lot for the explanation of the error, very useful!

Alex Welch

02/18/2021, 4:27 PM

@Jim Crist-Harif thank you for the explanation. I might recommend either adding some verbiage to the pickling error message that recommends checking your prefect versions or putting something in a trouble shooting page in the documents

Alex Welch

02/18/2021, 4:28 PM

i was following a tutorial you guys had put out but (either i missed it or it wasn’t in the article ) i didnt see anything about the version of prefect used

Alex Welch

02/18/2021, 4:29 PM

regarding the

networkConfiguration

I would recommend some documents around how to best use and structure

--run-task-kwargs

Alex Welch

02/18/2021, 4:32 PM

when i was doing my research into how to use them I got hung up on what arguments prefect could take and ended up here. It was only after @ale shared the code from teh text_ecs_agent that I understood what and how to pass things. So maybe and example of what that

options.yml

could or should look like? and again - I’ve seen the

networkConfiguration

issue mentioned in other threads here. Maybe it makes sense to dump it in a Troubleshooting section of the documentation?

Alex Welch

02/18/2021, 4:32 PM

here is the test_ecs_agent code

Sean Talia

02/25/2021, 7:07 PM

i'm also just starting to play around with the ECS agent and am running into some of the same difficulties 😎

Sean Talia

02/25/2021, 7:09 PM

why is part of the setup for the agent to try to fetch the default VPC and then set

assignPublicIp= 'ENABLED'

on the vpc configuration?

Sean Talia

02/25/2021, 7:10 PM

do you NEED to have the

assignPublicIp

key set to ENABLED?

Sean Talia

02/25/2021, 7:18 PM

@Jim Crist-Harif i do think that it would be helpful for there to be some mention in the ECS Agent documentation that you need to either have a default VPC set or you must feed some required set of specifics about your VPC + network configuration (e.g. subnets, security groups, etc.)

Alex Welch

02/25/2021, 10:02 PM

I think that is a great point

Alex Welch

02/25/2021, 10:02 PM

having something that explains the reasoning and/or alternatives

Jim Crist-Harif

02/25/2021, 11:18 PM

I confess that I'm not the best at AWS stuff, so there might be a better way. We tried to make this as simple as possible - with zero configuration specified, the ECS agent attempts to infer the proper VPC configuration for you. For most deployments you'll need

assignPublicIp

configured, since otherwise you won't have public internet access (so pulling an image from dockerhub wouldn't work). Users that would want to override this probably also know what to do since this requires extra AWS effort on their end.

👍 1

Jim Crist-Harif

02/25/2021, 11:18 PM

I agree this could be better documented though.

Alex Welch

02/26/2021, 4:32 AM

Copy code

Users that would want to override this probably also know what to do since this requires extra AWS effort on their end

In my case, all the networking had been handled by another team and I didn’t have any control over it. To make things more complicated was that I did not have access to the

default vpc

and so I had to go the route i ended up at. I think this may be true for many analytics-engineer type roles. The infra/network teams would handle all of that stuff to ensure security.

felice

02/27/2021, 2:17 AM

hi all, this was very helpful - thank you! i had the same problem where i encountered this trying to run on a cluster in my company's aws account (though no problems in my own account), so likely it was specific to the networking setup by our infra/devops team, where similar to @Alex Welch i have no control over.

8 Views

Open in Slack

Previous Next