Hey guys does anyone have any experience using the FargateTa Prefect Community #ask-community

Hey guys, does anyone have any experience using th...

Darragh

07/14/2020, 8:27 AM

Hey guys, does anyone have any experience using the FargateTaskEnvironment? Specifically the aws_sesison_token, having problems getting AWs to accept it, I keep getting the following error:

Failed to load and execute Flow's environment: ClientError('An error occurred (UnrecognizedClientException) when calling the RegisterTaskDefinition operation: The security token included in the request is invalid')

Sample config for the environment:

Copy code

flow.environment = FargateTaskEnvironment(
    launch_type="FARGATE",
    region="us-east-1",
    aws_session_token="FwoGZXIvYXdzEOL//////////wEaDFSgRMmr3M07yXJ3gCKCAdZO6f/LRZc6b7DjDip0lrTvCa+FDQpFAGJEyB6Ka1tF9By3fTgKkbqSM6EnuHQgTEviQJrOn13E7wlvKKVV1++YGaa3gb1Pn9q12BxCN7I6SvQ8oBW9AE73Judo0tuYTdTc5eYC7m2PaYU/d5fkRIj29EJWp9EpO3+yq/S1saxiGvYosKGy+AUyKBOOerH4ymGN9lxo/5aprU5DXumyaC6yg2satDNNoUdPaSBQ2R8fNq0=",
    cpu="256",
    memory="512",
    enable_task_revisions="True",
....
....
)

Zachary Hughes

07/14/2020, 1:07 PM

Hi @Darragh, are you able to use the session token to create a task definition manually? Trying to drill down into whether the issue is on the token/AWS IAM side or the Prefect side.

Darragh

07/14/2020, 1:47 PM

Hey @Zachary Hughes Couple of interesting points on that: • Creating a manual task definition doesn’t seem to require a session token at all • When I create a TaskDefinition using the 2 images I’m trying to build into my Prefect Flow and run it manually they both run • The TaskDefinition created by Prefect apparently can’t use the Fargate launch type? Screenshot below.

Darragh

07/14/2020, 1:47 PM

Darragh

07/14/2020, 1:54 PM

FargateTakEnvironment code

Zachary Hughes

07/14/2020, 2:22 PM

When you say that creating a task definition doesn't seem to require a session token, is there any chance your local environment is falling back on stored AWS credentials or local config?

Darragh

07/14/2020, 2:42 PM

Creating a TaskDefintion, manually in AWS console, doesn’t require a session token, so it shouldn’t be anything to do with the local env… The Prefect docs seem to suggest the session token is needed but I’m not seeing why?

Zachary Hughes

07/14/2020, 3:09 PM

Ah, okay. I presumed that since AWS probably handles most of the IAM when interacting through the console, you were working using boto3. From my understanding, you might not need the session token. The environment accepts all forms of auth that boto3 does, and session tokens are generally used for temporary access. I'd recommend giving this a shot without the aws_session_token and seeing how it behaves.

Darragh

07/14/2020, 4:32 PM

Yeah I tried that previously @Zachary Hughes, this is what I got:

Failed to load and execute Flow's environment: ParamValidationError('Parameter validation failed:\nMissing required parameter in input: "taskDefinition"')

Zachary Hughes

07/14/2020, 5:04 PM

Okay, gotcha. Just wrapped up reading through the previous thread about this, so hopefully I've got a bit more context to help with now. How are you creating this token? The fact that it's saying the token is invalid rather than malformed makes me think it might be expired. Alternatively, I'd expect to be able to provide something other than an

aws_session_token

. We can definitely open an issue for that change if you'd like.

Darragh

07/14/2020, 5:08 PM

It sholdn’t have been expired, as I created manually just before testing, but always possible. Could you confirm whether or not the

aws_session_token

is definitely needed? From the output I got above [running without the token] my config is definitely missing something, it’s just not telling me what it is 🙂

Zachary Hughes

07/14/2020, 5:10 PM

Checking the documentation and the code itself, I don't think it should be required. So if just removing that value throws the error you posted, I think it's definitely unintentional.

Darragh

07/14/2020, 5:21 PM

Ok thanks - so with that last error above,

Missing required parameter

- it doesn’t seem to be specific to the token - is there any way of getting more info on what that error is?

Zachary Hughes

07/14/2020, 5:21 PM

I'm not sure if this is the culprit or tangential, but it looks like you're not passing a

taskDefinition

. I think you'll also need to pass a

taskDefinition

that matches the

family

, as mentioned in this link: https://docs.prefect.io/api/latest/environments/execution.html#fargatetaskenvironment

Zachary Hughes

07/14/2020, 5:24 PM

Looks like

Missing required parameter

is coming from boto3 itself, but taking a look to see if we can find any additional information.

Zachary Hughes

07/14/2020, 5:28 PM

Is this error being raised by the agent? If so, we pull the

family

argument from the

task_definition_name

, so that could definitely be the missing parameter for

taskDefinition

Darragh

07/14/2020, 5:41 PM

It’s being raised in the UI as the Flow status, so my guess is the agent. Testing the taskDefinition change now, good catch 👍

👍 1

Darragh

07/14/2020, 6:32 PM

Well this is a new and very unwelcome development…

Failed to load and execute Flow's environment: InvalidParameterException('An error occurred (InvalidParameterException) when calling the RunTask operation: Task definition does not support launch_type FARGATE.')

Zachary Hughes

07/14/2020, 6:55 PM

That's definitely unanticipated. I think the answer is probably "yes," but the default value for

launch_type

is FARGATE-- do you see this error even when you don't specify

launch_type

in your environment?

Zachary Hughes

07/14/2020, 6:57 PM

It's also super unlikely, but what version of the AWS CLI are you running? If it's out of date, that could explain the inability to accept FARGATE as a launch type.

Darragh

07/14/2020, 7:03 PM

I was on 1.16, upgrading now to test… Leaving the launch type out of the config didn’t make any difference

👍 1

Darragh

07/14/2020, 7:26 PM

Upgrade didn’t solve it either, still the same error

Zachary Hughes

07/14/2020, 7:53 PM

Doing some digging here-- not 100% sure what could be happening on the AWS end to prompt this error.

Zachary Hughes

07/14/2020, 8:02 PM

A lot of the

Task definition does not support launch_type FARGATE

errors I'm seeing online have to do with not properly setting

requiresCompatibilities

. Can you confirm that you haven't set your agent's

launch_type

to EC2 and try specifying FARGATE for

requiresCompatibilities

in your task definition?

Darragh

07/14/2020, 8:04 PM

Is requireCompatabilties on the agent or the Flow environment, or both?

Zachary Hughes

07/14/2020, 8:07 PM

It's on both. As long as you haven't overriden the agent's

launch_type

, it should default to setting the appropriate

requiresCompatibilities

field there. You may need to specify it for your flow environment, though.

Darragh

07/14/2020, 8:10 PM

launch_type is FARGATE on the agent, so should I change?

Darragh

07/14/2020, 8:16 PM

Still same error either way

Zachary Hughes

07/14/2020, 8:25 PM

Nope, your agent sounds good-- you'd want to update that field on the environment. But you're still seeing this even with the environment's

requiresCompatibilities

set?

Darragh

07/14/2020, 9:46 PM

Yeah just double checked there, still the same error with

requiresCompatibilities

set on the env

Zachary Hughes

07/14/2020, 10:19 PM

That is very, very strange. If you're able to, I'd try putting some breakpoints or print statements around this line to see what's going on on the agent side of this: https://github.com/PrefectHQ/prefect/blob/master/src/prefect/agent/fargate/agent.py#L670-L675 and this line if you're interested in the environment side of this: https://github.com/PrefectHQ/prefect/blob/master/src/prefect/environments/execution/fargate/fargate_task.py#L271 Barring that, it would be useful to see if you can recreate this issue using pure boto3. The issue appears to be on the AWS side, but it doesn't appear to be raising a particularly informative error.

Darragh

07/14/2020, 10:54 PM

Print statements I can do, but breakpoints on an agent running on ec2 could be tricky 😁 With respect to trying with ours boto - I'm not sure what would prove, from a user perspective? I've tried doing a manual task definition with 2 containers ( which is all I'm trying to prove right now) and that works on Fargate, but not with when i define it as prefect Fargate task ... I'll try the print statements tomorrow as it's hitting midnight now. Thanks for all the help @Zachary Hughes!!

Darragh

07/15/2020, 4:33 PM

So, new and interesting things to report @Zachary Hughes! Question the first though, cos I may have made a massively stupid assumption. I I’m using FargateTaskEnvironment, what agent should I be using?

Zachary Hughes

07/15/2020, 4:34 PM

I'd anticipate you using a Fargate agent!

Darragh

07/15/2020, 4:39 PM

And you would be correct! HOWEVER. Strange and interesting things happen , due to what’s possibly a dodgy config on my part. I had a containerDefintions section in my agent config, and when I added logging to the lines you pointed out above, I got the following back:

Darragh

07/15/2020, 4:40 PM

Untitled

Darragh

07/15/2020, 4:41 PM

The resposne from

boto3.register_task_definition

was returning this, which shows it using the containerDefintions data from the agent, rather than the flow

Darragh

07/15/2020, 4:43 PM

But to make it extra painful, when I remove the containerDefinitions section from my agent, I now don’t get the debug response output from my agent at all. I’m wondering is there some funky combination of configs between my agent and the environment thats causing the problem…?

Darragh

07/15/2020, 5:07 PM

And the output from the

run_task

Darragh

07/15/2020, 5:09 PM

Untitled

Darragh

07/15/2020, 5:09 PM

Neither of those json chunks show any hint of the 2nd container defined in my flow

Zachary Hughes

07/15/2020, 5:14 PM

You mentioned that when you remove

containerDefinitions

from the agent, you don't get your debug statement. Does the attempted run still fail with the same error we were discussing yesterday?

Darragh

07/15/2020, 5:40 PM

Yeah still the same error,

Task definition does not support launch_type FARGATE.')

I’ve removed the launch type from both the agent and the env, no luck

Zachary Hughes

07/15/2020, 6:58 PM

I think the interaction between the agent's configuration and the environment's configuration could explain why you're not seeing the second container defined in your flow, but it still doesn't explain the

Task definition does not support launch_type FARGATE

error you get when the

requiresCompatibilities

field is clearly showing FARGATE.

Zachary Hughes

07/15/2020, 6:58 PM

Stop me if you've already tried this, but can you create and run any Fargate tasks on your cluster?

Darragh

07/15/2020, 7:22 PM

Yeah it’s odd. I’ve no issues creating other Fargate tasks , in fact I can scale out a bunch of parallell cloned flows on up to 100 nodes, but that’s using the

FargateAgent

and

LocalEnvironment(executor=LocalExecutor())

Darragh

07/15/2020, 7:23 PM

I’ll try ping it off AWS support, see if they can dig into where the request is going wrong. Their internal diagnostics tend to be pretty good, so I’ve got the fingers crossed 🙂

Zachary Hughes

07/15/2020, 7:37 PM

Yeah. Sorry I can't be of more assistance, but the config you're providing to boto3 looks like it should work.

Greg Desmarais

07/20/2020, 5:03 PM

Hey @Darragh - did you ever get your config working? Would you be willing to share the client side (executir/environment, etc.) and agent setup? I was also wondering if you were working with an existing cluster or setting one up dynamically. And if the former, are you in the default cluster or a different one?

Greg Desmarais

07/20/2020, 5:03 PM

tia...

Darragh

07/20/2020, 5:58 PM

Hey @Greg Desmarais haven't gotten a resolution on it yet, I need to push it to AWS support this evening and see if they can shed any light. If I get anyhting back I'll post it and share!

Darragh

07/20/2020, 6:55 PM

Hey @Zachary Hughes Back on this again. I’ve added logging output for all of the args that get passed to the boto3 client, and it doesn’t look like the containers I configure in my FargateTaskExecution are being passed along. Here’s a whole chonk of output that might be useful to you… I’m holding off passing to AWS Support until we know definitively if the issue is on me/us or them

Darragh

07/20/2020, 6:57 PM

fargate agent output

Darragh

07/20/2020, 6:57 PM

I can’t see any mention of the 2 containers in there anywhere, despite being named in the flow…

Darragh

07/20/2020, 7:00 PM

FargateTaskExecution

Zachary Hughes

07/20/2020, 8:55 PM

Hmm, that's a bit odd. Is this still failing with the error about Fargate not being supported? I notice that

requiresCompatibilities

does seem to be properly set.

Darragh

07/20/2020, 9:49 PM

Yeah sane error

Zachary Hughes

07/20/2020, 9:52 PM

Hm. Given what I've read about that error, that feels like something wonky on the AWS side to me. I'm not 100% sure about the rest of it, but I'm fairly certain your code shouldn't be failing in that particular manner.

Greg Desmarais

07/21/2020, 4:11 AM

Thanks for getting back, @Darragh. I've gotten close, but it is with a log of debugging of the agent and trial and error of parameter names. At this point, I have moved on to working towards a different approach to clustering (with prefect).

Darragh

07/21/2020, 9:13 PM

Quick update guys, no answer back from AWS yet, but I did notice something I should’ve spotted before. My Flow configuration uses FartgateTaskEnvironment, and adds 2 containers in

containerDefinitions

, and also has the property

taskDefinition

set.

taskDefinition

is set to

my-flow

. Now what I;ve just spotted is that this Flow [source above somewhere] actually creates 2 Task Definitions in ECS. • Task Defintion A -

my-flow

-> Correctly configured with 2 containers, etc… • Task Definition B -

parallell-collector

- this is the name of the overall Flow, but only contains 1 container, the one defined by the Docker Storage in the Flow Seems a little odd that it would be creating 2 Task Definitions here - as an even more entertaining note, manually running

my-flow

on Fargate actually succeeds, I can see the TaskDefinition run to completion and dig into it’s logs. Any thoughts? I’m perfectly to accept an answer of “Oh your config is batshit, here’s what it should really look like” 🙂

Zachary Hughes

07/22/2020, 3:31 PM

Hi Darragh, apologies for the delay in response. To be honest, I'm not sure what your task definitions should look like. If you're up for inspecting your Prefect environment, you can try something along the lines of

Copy code

flow.environment = FargateTaskEnvironment(...)
flow.environment.setup(flow)
flow.environment.execute(flow)

If you're running into issues with your Prefect environment, we can open an issue to take a look. But I'm not 100% sure what your flow should look like on the Fargate side.

Darragh

07/22/2020, 8:48 PM

@Zachary Hughes Wow, instant fail came back out of that one!! Using your suggestion I got the following:

Traceback (most recent call last):

File "flowRunner.py", line 44, in <module>

python_dependencies=args.python_dependencies, execution=args.execution)

File "./definitions/data/parallell/collector/parallellCollector.py", line 109, in main

flow.environment.execute(flow)

File "/Users/Darragh/.pyenv/versions/3.6.10/lib/python3.6/site-packages/prefect/environments/execution/fargate/fargate_task.py", line 309, in execute

**self.task_run_kwargs,

File "/Users/Darragh/.pyenv/versions/3.6.10/lib/python3.6/site-packages/botocore/client.py", line 316, in _api_call

return self._make_api_call(operation_name, kwargs)

File "/Users/Darragh/.pyenv/versions/3.6.10/lib/python3.6/site-packages/botocore/client.py", line 635, in _make_api_call

raise error_class(parsed_response, operation_name)

botocore.errorfactory.InvalidParameterException: An error occurred (InvalidParameterException) when calling the RunTask operation: Task definition does not support launch_type FARGATE.

Darragh

07/23/2020, 5:00 PM

Hey @Zachary Hughes I know I keep bugging you about this stuff, is there anyone in the group who’s an expert on the FargateTaskEnvironment? Might be able to get me sorted and stop me annoying you with questions 🙂

Greg Desmarais

07/23/2020, 5:42 PM

Hey @Darragh - I ended up bailing on the Fargate approach. I found too many issues that I had to battle, and in the end we will want to do some things with our images that the ec2 instances use in the cluster (mounting nfs/efs, etc.). I'm sorry - I got close, but failed, and bailed.

Darragh

07/23/2020, 5:47 PM

Hey @Greg Desmarais Yeah I’m close to that point too, it’s causing me all kinds of pain. I’d still like to make it work as it satisfies our use case better, but if it comes to it then the bail is the only way

Zachary Hughes

07/23/2020, 5:58 PM

Hi @Darragh, absolutely not bugging! I've been pinging some of these questions off the team, so there's a bit of collective Fargate knowledge being surfaced here. That said, with this

Task definition does not support launch_type

issue, I think your best bet is to either open this issue up to the channel at large and see if anyone in the community has had it, or to see what AWS has to say. You're also welcome to open an issue so we can socialize and potentially figure out what's going on here!

Greg Desmarais

07/23/2020, 6:03 PM

Ok @Darragh - let me give more than a cursory look at the error...

Greg Desmarais

07/23/2020, 6:03 PM

Where is your prefect server and agent running?

Greg Desmarais

07/23/2020, 6:04 PM

I ask because including aws creds implies that the process creating the cluster is not operating with an assumed role./

Greg Desmarais

07/23/2020, 6:05 PM

I had ended up putting my server infrastructure on an ec2 with an assigned role, that role had the policies needed to create clusters (I just went with ECS full). That way I avoided the authorization/token/etc. work all together.

Greg Desmarais

07/23/2020, 6:06 PM

I did have some intermediate work where I had to use an identity (creds/token), and I used code like this:

Copy code

def get_aws_credentials() -> tuple:
    """
    Retrieve the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY and return as a tuple
    :return: tuple of AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY
    """
    home = os.path.expanduser("~")
    creds_path = os.path.join(home, '.aws/credentials')
    if not os.path.isfile(creds_path):
        raise ValueError(f'You must have a configured ~/.aws/credentials file - see '
                         f'<https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html>')
    config = configparser.ConfigParser()
    try:
        config.read(creds_path)
    except Exception as ex:
        logger.warning(f'Unable to read {creds_path}, are permissions correct?')
        raise ex
    return config['default'].get('aws_access_key_id'), config['default'].get('aws_secret_access_key')


def set_aws_credentials_env():
    key_id, key = get_aws_credentials()
    os.environ['AWS_ACCESS_KEY_ID'] = key_id
    os.environ['AWS_SECRET_ACCESS_KEY'] = key
    os.environ['REGION_NAME'] = DEFAULT_REGION

This also helped me avoid a token creation.

Greg Desmarais

07/23/2020, 6:08 PM

Lastly, I have my server startup scripts that query the amazon environment for some settings, and in those queries I get an API token. I don't know if that is of any use to you:

Copy code

TOKEN=`curl -s -X PUT "<http://169.254.169.254/latest/api/token>" -H "X-aws-ec2-metadata-token-ttl-seconds: 21600"`
EC2_IP=`curl -s -H "X-aws-ec2-metadata-token: $TOKEN" <http://169.254.169.254/latest/meta-data/local-ipv4>`

Darragh

07/23/2020, 6:49 PM

Hey Greg, thanks for that! I think our issues were slightly different. I don’t need to create a session token [so far anyway…] the problem is more that there’s some sort of config mismatch between my agent and environment, but there doesn’t seem to be any useful debug I can pull to see what’s happening… That being said, Zach’s last pointer enabled me to get the same error building the flow locally as I do on fargate. Unfortunately none of these things are explaining to me why my flow is creating 2 task definitions rather than the 1 I requested 🙂

Greg Desmarais

07/23/2020, 6:55 PM

Well...sorry it wasn't of much use 🤷

Darragh

07/23/2020, 6:57 PM

Thanks @Zachary Hughes I think at this point what I really need is a working example from the Prefect side of how the fargate agent and environment are supposed to work and what pieces of the seemingly common config need to be passed to each one. If I could get that then I’d be able to piece together what’s going on with my one. Aside from the

launch_type FARGATE

issue I still don’t undertand why it’s creating 2 Task Definitions? If it was at all possible to get a working example of creating 2 containers in the FargateTaskEnvironment that’d be great! If not I’ll just dump it to GitHub…

Darragh

07/23/2020, 6:58 PM

@Greg Desmarais No that was definitely helpful , I may well end up following you down that same route 🙂

Darragh

07/23/2020, 7:10 PM

I’m debugging my way through the boto client code at the moment, and somewhere along the way it’s being told to use

launchType=FARGATE

but I don’t know exactly where yet

Darragh

07/23/2020, 7:19 PM

@Zachary Hughes I found the problem. launchType is hardcoded in the FargateTask.

Darragh

07/23/2020, 7:19 PM

Copy code

def __init__(  # type: ignore
    self,
    launch_type: str = "FARGATE",
    aws_access_key_id: str = None,
    aws_secret_access_key: str = None,
    aws_session_token: str = None,
    region_name: str = None,
    executor: "prefect.engine.executors.Executor" = None,
    executor_kwargs: dict = None,
    labels: List[str] = None,
    on_start: Callable = None,
    on_exit: Callable = None,
    metadata: dict = None,
    **kwargs,
) -> None:
    self.launch_type = launch_type
    # Not serialized, only stored on the object
    self.aws_access_key_id = aws_access_key_id or os.getenv("AWS_ACCESS_KEY_ID")
    self.aws_secret_access_key = aws_secret_access_key or os.getenv(
        "AWS_SECRET_ACCESS_KEY"
    )
    self.aws_session_token = aws_session_token or os.getenv("AWS_SESSION_TOKEN")
    self.region_name = region_name or os.getenv("REGION_NAME")

Zachary Hughes

07/23/2020, 7:20 PM

Okay, got it. Apologies if I'm missing something, but wouldn't you still want Fargate as the launch type regardless?

Darragh

07/23/2020, 7:22 PM

I do, but the problem I’ve been having is that amazon is continually responding with the launch_type error. Removing it from the code might at least let me progress past this issue and see what other problems might be in the config

Zachary Hughes

07/23/2020, 7:34 PM

Okay, solid! Since the Fargate Task Environment was designed to have that value hardcoded, I can't guarantee how it'll behave with it removed. But if you want to fork/branch Prefect and see how it operates with that removed, I'd be curious to hear the results.

105 Views

Open in Slack

Previous Next