Hey guys, does anyone have any experience using th...
# prefect-community
d
Hey guys, does anyone have any experience using the FargateTaskEnvironment? Specifically the aws_sesison_token, having problems getting AWs to accept it, I keep getting the following error:
Failed to load and execute Flow's environment: ClientError('An error occurred (UnrecognizedClientException) when calling the RegisterTaskDefinition operation: The security token included in the request is invalid')
Sample config for the environment:
Copy code
flow.environment = FargateTaskEnvironment(
    launch_type="FARGATE",
    region="us-east-1",
    aws_session_token="FwoGZXIvYXdzEOL//////////wEaDFSgRMmr3M07yXJ3gCKCAdZO6f/LRZc6b7DjDip0lrTvCa+FDQpFAGJEyB6Ka1tF9By3fTgKkbqSM6EnuHQgTEviQJrOn13E7wlvKKVV1++YGaa3gb1Pn9q12BxCN7I6SvQ8oBW9AE73Judo0tuYTdTc5eYC7m2PaYU/d5fkRIj29EJWp9EpO3+yq/S1saxiGvYosKGy+AUyKBOOerH4ymGN9lxo/5aprU5DXumyaC6yg2satDNNoUdPaSBQ2R8fNq0=",
    cpu="256",
    memory="512",
    enable_task_revisions="True",
....
....
)
z
Hi @Darragh, are you able to use the session token to create a task definition manually? Trying to drill down into whether the issue is on the token/AWS IAM side or the Prefect side.
d
Hey @Zachary Hughes Couple of interesting points on that: • Creating a manual task definition doesn’t seem to require a session token at all • When I create a TaskDefinition using the 2 images I’m trying to build into my Prefect Flow and run it manually they both run • The TaskDefinition created by Prefect apparently can’t use the Fargate launch type? Screenshot below.
FargateTakEnvironment code
z
When you say that creating a task definition doesn't seem to require a session token, is there any chance your local environment is falling back on stored AWS credentials or local config?
d
Creating a TaskDefintion, manually in AWS console, doesn’t require a session token, so it shouldn’t be anything to do with the local env… The Prefect docs seem to suggest the session token is needed but I’m not seeing why?
z
Ah, okay. I presumed that since AWS probably handles most of the IAM when interacting through the console, you were working using boto3. From my understanding, you might not need the session token. The environment accepts all forms of auth that boto3 does, and session tokens are generally used for temporary access. I'd recommend giving this a shot without the aws_session_token and seeing how it behaves.
d
Yeah I tried that previously @Zachary Hughes, this is what I got:
Failed to load and execute Flow's environment: ParamValidationError('Parameter validation failed:\nMissing required parameter in input: "taskDefinition"')
z
Okay, gotcha. Just wrapped up reading through the previous thread about this, so hopefully I've got a bit more context to help with now. How are you creating this token? The fact that it's saying the token is invalid rather than malformed makes me think it might be expired. Alternatively, I'd expect to be able to provide something other than an
aws_session_token
. We can definitely open an issue for that change if you'd like.
d
It sholdn’t have been expired, as I created manually just before testing, but always possible. Could you confirm whether or not the
aws_session_token
is definitely needed? From the output I got above [running without the token] my config is definitely missing something, it’s just not telling me what it is 🙂
z
Checking the documentation and the code itself, I don't think it should be required. So if just removing that value throws the error you posted, I think it's definitely unintentional.
d
Ok thanks - so with that last error above,
Missing required parameter
- it doesn’t seem to be specific to the token - is there any way of getting more info on what that error is?
z
I'm not sure if this is the culprit or tangential, but it looks like you're not passing a
taskDefinition
. I think you'll also need to pass a
taskDefinition
that matches the
family
, as mentioned in this link: https://docs.prefect.io/api/latest/environments/execution.html#fargatetaskenvironment
Looks like
Missing required parameter
is coming from boto3 itself, but taking a look to see if we can find any additional information.
Is this error being raised by the agent? If so, we pull the
family
argument from the
task_definition_name
, so that could definitely be the missing parameter for
taskDefinition
.
d
It’s being raised in the UI as the Flow status, so my guess is the agent. Testing the taskDefinition change now, good catch 👍
👍 1
Well this is a new and very unwelcome development…
Failed to load and execute Flow's environment: InvalidParameterException('An error occurred (InvalidParameterException) when calling the RunTask operation: Task definition does not support launch_type FARGATE.')
z
That's definitely unanticipated. I think the answer is probably "yes," but the default value for
launch_type
is FARGATE-- do you see this error even when you don't specify
launch_type
in your environment?
It's also super unlikely, but what version of the AWS CLI are you running? If it's out of date, that could explain the inability to accept FARGATE as a launch type.
d
I was on 1.16, upgrading now to test… Leaving the launch type out of the config didn’t make any difference
👍 1
Upgrade didn’t solve it either, still the same error
z
Doing some digging here-- not 100% sure what could be happening on the AWS end to prompt this error.
A lot of the
Task definition does not support launch_type FARGATE
errors I'm seeing online have to do with not properly setting
requiresCompatibilities
. Can you confirm that you haven't set your agent's
launch_type
to EC2 and try specifying FARGATE for
requiresCompatibilities
in your task definition?
d
Is requireCompatabilties on the agent or the Flow environment, or both?
z
It's on both. As long as you haven't overriden the agent's
launch_type
, it should default to setting the appropriate
requiresCompatibilities
field there. You may need to specify it for your flow environment, though.
d
launch_type is FARGATE on the agent, so should I change?
Still same error either way
z
Nope, your agent sounds good-- you'd want to update that field on the environment. But you're still seeing this even with the environment's
requiresCompatibilities
set?
d
Yeah just double checked there, still the same error with
requiresCompatibilities
set on the env
z
That is very, very strange. If you're able to, I'd try putting some breakpoints or print statements around this line to see what's going on on the agent side of this: https://github.com/PrefectHQ/prefect/blob/master/src/prefect/agent/fargate/agent.py#L670-L675 and this line if you're interested in the environment side of this: https://github.com/PrefectHQ/prefect/blob/master/src/prefect/environments/execution/fargate/fargate_task.py#L271 Barring that, it would be useful to see if you can recreate this issue using pure boto3. The issue appears to be on the AWS side, but it doesn't appear to be raising a particularly informative error.
d
Print statements I can do, but breakpoints on an agent running on ec2 could be tricky 😁 With respect to trying with ours boto - I'm not sure what would prove, from a user perspective? I've tried doing a manual task definition with 2 containers ( which is all I'm trying to prove right now) and that works on Fargate, but not with when i define it as prefect Fargate task ... I'll try the print statements tomorrow as it's hitting midnight now. Thanks for all the help @Zachary Hughes!!
So, new and interesting things to report @Zachary Hughes! Question the first though, cos I may have made a massively stupid assumption. I I’m using FargateTaskEnvironment, what agent should I be using?
z
I'd anticipate you using a Fargate agent!
d
And you would be correct! HOWEVER. Strange and interesting things happen , due to what’s possibly a dodgy config on my part. I had a containerDefintions section in my agent config, and when I added logging to the lines you pointed out above, I got the following back:
Untitled
The resposne from
boto3.register_task_definition
was returning this, which shows it using the containerDefintions data from the agent, rather than the flow
But to make it extra painful, when I remove the containerDefinitions section from my agent, I now don’t get the debug response output from my agent at all. I’m wondering is there some funky combination of configs between my agent and the environment thats causing the problem…?
And the output from the
run_task
Untitled
Neither of those json chunks show any hint of the 2nd container defined in my flow
z
You mentioned that when you remove
containerDefinitions
from the agent, you don't get your debug statement. Does the attempted run still fail with the same error we were discussing yesterday?
d
Yeah still the same error,
Task definition does not support launch_type FARGATE.')
I’ve removed the launch type from both the agent and the env, no luck
z
I think the interaction between the agent's configuration and the environment's configuration could explain why you're not seeing the second container defined in your flow, but it still doesn't explain the
Task definition does not support launch_type FARGATE
error you get when the
requiresCompatibilities
field is clearly showing FARGATE.
Stop me if you've already tried this, but can you create and run any Fargate tasks on your cluster?
d
Yeah it’s odd. I’ve no issues creating other Fargate tasks , in fact I can scale out a bunch of parallell cloned flows on up to 100 nodes, but that’s using the
FargateAgent
and
LocalEnvironment(executor=LocalExecutor())
I’ll try ping it off AWS support, see if they can dig into where the request is going wrong. Their internal diagnostics tend to be pretty good, so I’ve got the fingers crossed 🙂
z
Yeah. Sorry I can't be of more assistance, but the config you're providing to boto3 looks like it should work.
g
Hey @Darragh - did you ever get your config working? Would you be willing to share the client side (executir/environment, etc.) and agent setup? I was also wondering if you were working with an existing cluster or setting one up dynamically. And if the former, are you in the default cluster or a different one?
tia...
d
Hey @Greg Desmarais haven't gotten a resolution on it yet, I need to push it to AWS support this evening and see if they can shed any light. If I get anyhting back I'll post it and share!
Hey @Zachary Hughes Back on this again. I’ve added logging output for all of the args that get passed to the boto3 client, and it doesn’t look like the containers I configure in my FargateTaskExecution are being passed along. Here’s a whole chonk of output that might be useful to you… I’m holding off passing to AWS Support until we know definitively if the issue is on me/us or them
fargate agent output
I can’t see any mention of the 2 containers in there anywhere, despite being named in the flow…
FargateTaskExecution
z
Hmm, that's a bit odd. Is this still failing with the error about Fargate not being supported? I notice that
requiresCompatibilities
does seem to be properly set.
d
Yeah sane error
z
Hm. Given what I've read about that error, that feels like something wonky on the AWS side to me. I'm not 100% sure about the rest of it, but I'm fairly certain your code shouldn't be failing in that particular manner.
g
Thanks for getting back, @Darragh. I've gotten close, but it is with a log of debugging of the agent and trial and error of parameter names. At this point, I have moved on to working towards a different approach to clustering (with prefect).
d
Quick update guys, no answer back from AWS yet, but I did notice something I should’ve spotted before. My Flow configuration uses FartgateTaskEnvironment, and adds 2 containers in
containerDefinitions
, and also has the property
taskDefinition
set.
taskDefinition
is set to
my-flow
. Now what I;ve just spotted is that this Flow [source above somewhere] actually creates 2 Task Definitions in ECS. • Task Defintion A -
my-flow
-> Correctly configured with 2 containers, etc… • Task Definition B -
parallell-collector
- this is the name of the overall Flow, but only contains 1 container, the one defined by the Docker Storage in the Flow Seems a little odd that it would be creating 2 Task Definitions here - as an even more entertaining note, manually running
my-flow
on Fargate actually succeeds, I can see the TaskDefinition run to completion and dig into it’s logs. Any thoughts? I’m perfectly to accept an answer of “Oh your config is batshit, here’s what it should really look like” 🙂
z
Hi Darragh, apologies for the delay in response. To be honest, I'm not sure what your task definitions should look like. If you're up for inspecting your Prefect environment, you can try something along the lines of
Copy code
flow.environment = FargateTaskEnvironment(...)
flow.environment.setup(flow)
flow.environment.execute(flow)
If you're running into issues with your Prefect environment, we can open an issue to take a look. But I'm not 100% sure what your flow should look like on the Fargate side.
d
@Zachary Hughes Wow, instant fail came back out of that one!! Using your suggestion I got the following:
Traceback (most recent call last):
File "flowRunner.py", line 44, in <module>
python_dependencies=args.python_dependencies, execution=args.execution)
File "./definitions/data/parallell/collector/parallellCollector.py", line 109, in main
flow.environment.execute(flow)
File "/Users/Darragh/.pyenv/versions/3.6.10/lib/python3.6/site-packages/prefect/environments/execution/fargate/fargate_task.py", line 309, in execute
**self.task_run_kwargs,
File "/Users/Darragh/.pyenv/versions/3.6.10/lib/python3.6/site-packages/botocore/client.py", line 316, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/Users/Darragh/.pyenv/versions/3.6.10/lib/python3.6/site-packages/botocore/client.py", line 635, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.errorfactory.InvalidParameterException: An error occurred (InvalidParameterException) when calling the RunTask operation: Task definition does not support launch_type FARGATE.
Hey @Zachary Hughes I know I keep bugging you about this stuff, is there anyone in the group who’s an expert on the FargateTaskEnvironment? Might be able to get me sorted and stop me annoying you with questions 🙂
g
Hey @Darragh - I ended up bailing on the Fargate approach. I found too many issues that I had to battle, and in the end we will want to do some things with our images that the ec2 instances use in the cluster (mounting nfs/efs, etc.). I'm sorry - I got close, but failed, and bailed.
d
Hey @Greg Desmarais Yeah I’m close to that point too, it’s causing me all kinds of pain. I’d still like to make it work as it satisfies our use case better, but if it comes to it then the bail is the only way
z
Hi @Darragh, absolutely not bugging! I've been pinging some of these questions off the team, so there's a bit of collective Fargate knowledge being surfaced here. That said, with this
Task definition does not support launch_type
issue, I think your best bet is to either open this issue up to the channel at large and see if anyone in the community has had it, or to see what AWS has to say. You're also welcome to open an issue so we can socialize and potentially figure out what's going on here!
g
Ok @Darragh - let me give more than a cursory look at the error...
Where is your prefect server and agent running?
I ask because including aws creds implies that the process creating the cluster is not operating with an assumed role./
I had ended up putting my server infrastructure on an ec2 with an assigned role, that role had the policies needed to create clusters (I just went with ECS full). That way I avoided the authorization/token/etc. work all together.
I did have some intermediate work where I had to use an identity (creds/token), and I used code like this:
Copy code
def get_aws_credentials() -> tuple:
    """
    Retrieve the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY and return as a tuple
    :return: tuple of AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY
    """
    home = os.path.expanduser("~")
    creds_path = os.path.join(home, '.aws/credentials')
    if not os.path.isfile(creds_path):
        raise ValueError(f'You must have a configured ~/.aws/credentials file - see '
                         f'<https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html>')
    config = configparser.ConfigParser()
    try:
        config.read(creds_path)
    except Exception as ex:
        logger.warning(f'Unable to read {creds_path}, are permissions correct?')
        raise ex
    return config['default'].get('aws_access_key_id'), config['default'].get('aws_secret_access_key')


def set_aws_credentials_env():
    key_id, key = get_aws_credentials()
    os.environ['AWS_ACCESS_KEY_ID'] = key_id
    os.environ['AWS_SECRET_ACCESS_KEY'] = key
    os.environ['REGION_NAME'] = DEFAULT_REGION
This also helped me avoid a token creation.
Lastly, I have my server startup scripts that query the amazon environment for some settings, and in those queries I get an API token. I don't know if that is of any use to you:
Copy code
TOKEN=`curl -s -X PUT "<http://169.254.169.254/latest/api/token>" -H "X-aws-ec2-metadata-token-ttl-seconds: 21600"`
EC2_IP=`curl -s -H "X-aws-ec2-metadata-token: $TOKEN" <http://169.254.169.254/latest/meta-data/local-ipv4>`
d
Hey Greg, thanks for that! I think our issues were slightly different. I don’t need to create a session token [so far anyway…] the problem is more that there’s some sort of config mismatch between my agent and environment, but there doesn’t seem to be any useful debug I can pull to see what’s happening… That being said, Zach’s last pointer enabled me to get the same error building the flow locally as I do on fargate. Unfortunately none of these things are explaining to me why my flow is creating 2 task definitions rather than the 1 I requested 🙂
g
Well...sorry it wasn't of much use 🤷
d
Thanks @Zachary Hughes I think at this point what I really need is a working example from the Prefect side of how the fargate agent and environment are supposed to work and what pieces of the seemingly common config need to be passed to each one. If I could get that then I’d be able to piece together what’s going on with my one. Aside from the
launch_type FARGATE
issue I still don’t undertand why it’s creating 2 Task Definitions? If it was at all possible to get a working example of creating 2 containers in the FargateTaskEnvironment that’d be great! If not I’ll just dump it to GitHub…
@Greg Desmarais No that was definitely helpful , I may well end up following you down that same route 🙂
I’m debugging my way through the boto client code at the moment, and somewhere along the way it’s being told to use
launchType=FARGATE
but I don’t know exactly where yet
@Zachary Hughes I found the problem. launchType is hardcoded in the FargateTask.
Copy code
def __init__(  # type: ignore
    self,
    launch_type: str = "FARGATE",
    aws_access_key_id: str = None,
    aws_secret_access_key: str = None,
    aws_session_token: str = None,
    region_name: str = None,
    executor: "prefect.engine.executors.Executor" = None,
    executor_kwargs: dict = None,
    labels: List[str] = None,
    on_start: Callable = None,
    on_exit: Callable = None,
    metadata: dict = None,
    **kwargs,
) -> None:
    self.launch_type = launch_type
    # Not serialized, only stored on the object
    self.aws_access_key_id = aws_access_key_id or os.getenv("AWS_ACCESS_KEY_ID")
    self.aws_secret_access_key = aws_secret_access_key or os.getenv(
        "AWS_SECRET_ACCESS_KEY"
    )
    self.aws_session_token = aws_session_token or os.getenv("AWS_SESSION_TOKEN")
    self.region_name = region_name or os.getenv("REGION_NAME")
z
Okay, got it. Apologies if I'm missing something, but wouldn't you still want Fargate as the launch type regardless?
d
I do, but the problem I’ve been having is that amazon is continually responding with the launch_type error. Removing it from the code might at least let me progress past this issue and see what other problems might be in the config
z
Okay, solid! Since the Fargate Task Environment was designed to have that value hardcoded, I can't guarantee how it'll behave with it removed. But if you want to fork/branch Prefect and see how it operates with that removed, I'd be curious to hear the results.