https://prefect.io logo
a

ale

10/06/2020, 4:53 PM
Hey folks, anyone experiences issues when creating task definitions on Fargate. I mean, task definitions are created, but at task setup I get the following error
c

Chris White

10/06/2020, 4:54 PM
Hi Alessandro - can you please copy / paste this into a thread? It takes up the entire Slack window and prevents other threads from being seen
👍 1
a

ale

10/06/2020, 4:54 PM
Copy code
Failed to load and execute Flow's environment: ValueError('The given taskDefinition does not match the existing 

taskDefinition 00-orchestrator.nDetail:   

containerDefinition.0.environment -> 

Given: 
	[
		{'name': 'PREFECT__CLOUD__GRAPHQL', 'value': '<https://apollo-prefect-server.srv-stage.xxx.xyz/graphql/graphql>'}, 
		{'name': 'PREFECT__CLOUD__USE_LOCAL_SECRETS', 'value': 'false'}, 
		{'name': 'PREFECT__ENGINE__FLOW_RUNNER__DEFAULT_CLASS', 'value': 'prefect.engine.cloud.CloudFlowRunner'}, 
		{'name': 'PREFECT__ENGINE__TASK_RUNNER__DEFAULT_CLASS', 'value': 'prefect.engine.cloud.CloudTaskRunner'}, 
		{'name': 'PREFECT__LOGGING__LOG_TO_CLOUD', 'value': 'true'}, {'name': 'PREFECT__LOGGING__EXTRA_LOGGERS', 'value': '[]'}
	], 
Expected: 
	[
		{'name': 'PREFECT__CLOUD__AGENT__LABELS', 'value': '[]'}, 
		{'name': 'PREFECT__LOGGING__LEVEL', 'value': 'INFO'}, 
		{'name': 'PREFECT__ENGINE__TASK_RUNNER__DEFAULT_CLASS', 'value': 'prefect.engine.cloud.CloudTaskRunner'}, 
		{'name': 'PREFECT__LOGGING__LOG_TO_CLOUD', 'value': 'true'}, 
		{'name': 'PREFECT__CLOUD__USE_LOCAL_SECRETS', 'value': 'false'}, 
		{'name': 'PREFECT__ENGINE__FLOW_RUNNER__DEFAULT_CLASS', 'value': 'prefect.engine.cloud.CloudFlowRunner'}, 
		{'name': 'PREFECT__CLOUD__API', 'value': '<https://apollo-prefect-server.srv-stage.xxx.xyz/graphql>'}
	]  

containerDefinition.0.name -> 
Given: flow-container, 
Expected: flow  

containerDefinition.0.command -> 
Given: ['/bin/sh', '-c', "python -c 'import prefect; prefect.environments.execution.load_and_run_flow()'"], 
Expected: ['/bin/sh', '-c', 'prefect execute flow-run']  

taskRoleArn -> 
Given: arn:aws:iam::xxxxxxxxxxxx:role/stage-prefect-flows-00-orchestrator-rTaskRole-1JJ3YRSUBDEUY, 
Expected: arn:aws:iam::xxxxxxxxxxxx:role/stage-safelake-etl-rAgentTaskRole-1JTX5S73405MQ  

executionRoleArn -> 
Given: arn:aws:iam::xxxxxxxxxxxx:role/stage-prefect-flows-00-orchestrator-rExecutionRole-CLWR2BUOW8IW, 
Expected: arn:aws:iam::xxxxxxxxxxxx:role/stage-safelake-etl-rExecutionRole-1QPZMUQ6CUP5X  

memory -> 
Given: 1024, 
Expected: 512

nnIf the given configuration is desired, deregister the existingntaskDefinition and re-run the flow. Alternatively, you cannchange the family/taskDefinition name in the FargateTaskEnvironmentnfor this flow.')
👍 1
s

Spencer

10/06/2020, 5:09 PM
I actually wrote the PR that introduced this error
upvote 1
It's a safety mechanism
The new environment doesn't match the environment that's going to be used by your flows, so it errors out since there is a mismatch and gives you some approaches to address it at the end of the error message.
a

ale

10/06/2020, 5:15 PM
Thanks for the quick reply @Spencer! I can’t understand why this error happens, since the task definition is created by the fargate agent. It seems the fargate agent is creating a task definition using a configuration, but the task flow expects another one. What am I missing?
s

Spencer

10/06/2020, 5:15 PM
??
The error explains it
It says there's an existing task definition that matches the name exactly
The pre-existing task definition has different values than is provided in the new environment. As the FargateTaskEnvironment currently runs, it would have NOT used the new values which is, presumably, not desirable (otherwise, why did you change the values?). Prior to this error, your flows would run but with the OLD values (cpu/memory, etc) and thus not behave as anticipated.
a

ale

10/06/2020, 5:19 PM
I agree with you that this is the intended behaviour. The strange thing is that the error happens even when the task definition does not exists and is created from scratch by the agent
s

Spencer

10/06/2020, 5:20 PM
How in the world could it not exist?
Where are the "expected" values coming from if not from the AWS API? Is there no taskDefinition of family
00-orchestrator
? Perhaps it's in a different region?
a

ale

10/06/2020, 5:22 PM
Let me try to describe what happens: 1. deploy my flow, configured with environment and storage, to Prefect server. At this point no task definition exists on ECS 2. Hit run on flow 00-orchestrator from Prefect Server 3. Fargate Agent creates the task definition 4. The flow terminate with the reported error
And this is how I define the environment for the flow
Copy code
environment = FargateTaskEnvironment(
    cpu="256",
    memory="1024",
    family=ETL_FLOW_NAME,
    taskDefinition=ETL_FLOW_NAME,
    taskRoleArn=ETL_TASK_ROLE_ARN,
    executionRoleArn=ETL_EXECUTION_ROLE_ARN,
    requiresCompatibilities=["FARGATE"],
    containerDefinitions=[
        {
            "name": "flow",
            "image": ETL_IMAGE_NAME,
            "command": [
                "/bin/sh",
                "-c",
                "prefect execute flow-run"
            ],
            "environment": [],
            "essential": True,
        }
    ]
)
s

Spencer

10/06/2020, 5:26 PM
What is the value of
ETL_FLOW_NAME
? That is the name of the pre-existing taskDefinition
a

ale

10/06/2020, 5:27 PM
00-orchestrator
s

Spencer

10/06/2020, 5:27 PM
I'm not sure what to tell you about the task definition existing or not
It literally tries to access the task definition via boto3, and if it doesn't exist, it creates it
The error you're getting can only happen if the task definition exists prior to the flow run
a

ale

10/06/2020, 5:31 PM
Could it be that the Fargate Agent creates a task definition which is different from the one expected from the Fargate Task Environment?
s

Spencer

10/06/2020, 5:31 PM
The agent doesn't create the flow's task definition
It creates a separate task definition using the values passed to the Fargate Agent
I already explained this yesterday
The agent runs a task, which in turn runs your flow in another task
The agent will create a taskDefinition for this intermediate task
This intermediate task != your flow's task and as such they do not share the same taskDefinition.
Perhaps you've used the same
family
for your Agent's configuration, as you have for the
FargateTaskEnvironment
? Which would cause this issue
a

ale

10/06/2020, 5:35 PM
Will definitely check, thanks for the hints! And thanks a lot for your patience 🙂
However, it’s not clear where I should set the agent task definition
family
s

Spencer

10/06/2020, 5:43 PM
where or what?
a

ale

10/06/2020, 5:45 PM
Both, at this point 😓
s

Spencer

10/06/2020, 5:49 PM
How are you configuring the agent with
00-orchestrator
? Just change that to something else like
prefect-launch
or something 🤷‍♂️
a

ale

10/06/2020, 5:51 PM
Will try tomorrow, thanks @Spencer!
Finally I was able to get a flow running. However, it runs successfully the first time. From the second run it gives me the following error
Copy code
Failed to load and execute Flow's environment: ValueError("The given taskDefinition does not match the existing taskDefinition 01-process-event-flow.\nDetail: \n\

tcontainerDefinition.0.environment -> 
Given: [
{'name': 'PREFECT__CLOUD__GRAPHQL', 'value': '<https://apollo-prefect-server.srv-stage.cloudacademy.xyz/graphql/graphql>'}, 
{'name': 'PREFECT__CLOUD__USE_LOCAL_SECRETS', 'value': 'false'}, 
{'name': 'PREFECT__ENGINE__FLOW_RUNNER__DEFAULT_CLASS', 'value': 'prefect.engine.cloud.CloudFlowRunner'}, 
{'name': 'PREFECT__ENGINE__TASK_RUNNER__DEFAULT_CLASS', 'value': 'prefect.engine.cloud.CloudTaskRunner'}, 
{'name': 'PREFECT__LOGGING__LOG_TO_CLOUD', 'value': 'true'}, 
{'name': 'PREFECT__LOGGING__EXTRA_LOGGERS', 'value': '[]'}
], 

Expected: [
{'name': 'PREFECT__CLOUD__GRAPHQL', 'value': '<https://apollo-prefect-server.srv-stage.cloudacademy.xyz/graphql/graphql>'}
{'name': 'PREFECT__CLOUD__USE_LOCAL_SECRETS', 'value': 'false'}, 
{'name': 'PREFECT__ENGINE__FLOW_RUNNER__DEFAULT_CLASS', 'value': 'prefect.engine.cloud.CloudFlowRunner'}, 
{'name': 'PREFECT__ENGINE__TASK_RUNNER__DEFAULT_CLASS', 'value': 'prefect.engine.cloud.CloudTaskRunner'}, 
{'name': 'PREFECT__LOGGING__LOG_TO_CLOUD', 'value': 'true'}, 
{'name': 'PREFECT__LOGGING__EXTRA_LOGGERS', 'value': '[]'}, 

]

\n\nIf the given configuration is desired, deregister the existing\ntaskDefinition and re-run the flow. Alternatively, you can\nchange the family/taskDefinition name in the FargateTaskEnvironment\nfor this flow.")
But I don’t see differences between Given and Expected 😅
s

Spencer

10/07/2020, 12:17 PM
Interesting; though I'm curious about the extra blank line in the expected and mismatch on some of the commas. Perhaps look at the taskDefinition in the AWS console to see what environment is specified in the task definition? If it's the same as
Given
, then it's probably a slight oversight in the validation logic.
a

ale

10/07/2020, 12:37 PM
I added the blank lines just for readability
Here’s another example
Copy code
Failed to load and execute Flow's environment: ValueError("The given taskDefinition does not match the existing taskDefinition 01-process-event-flow.\nDetail: \n\tcontainerDefinition.0.environment -> Given: [{'name': 'PREFECT__CLOUD__GRAPHQL', 'value': '<https://apollo-prefect-server.srv-stage.cloudacademy.xyz/graphql/graphql>'}, {'name': 'PREFECT__CLOUD__USE_LOCAL_SECRETS', 'value': 'false'}, {'name': 'PREFECT__ENGINE__FLOW_RUNNER__DEFAULT_CLASS', 'value': 'prefect.engine.cloud.CloudFlowRunner'}, {'name': 'PREFECT__ENGINE__TASK_RUNNER__DEFAULT_CLASS', 'value': 'prefect.engine.cloud.CloudTaskRunner'}, {'name': 'PREFECT__LOGGING__LOG_TO_CLOUD', 'value': 'true'}, {'name': 'PREFECT__LOGGING__EXTRA_LOGGERS', 'value': '[]'}], Expected: [{'name': 'PREFECT__CLOUD__USE_LOCAL_SECRETS', 'value': 'false'}, {'name': 'PREFECT__ENGINE__FLOW_RUNNER__DEFAULT_CLASS', 'value': 'prefect.engine.cloud.CloudFlowRunner'}, {'name': 'PREFECT__ENGINE__TASK_RUNNER__DEFAULT_CLASS', 'value': 'prefect.engine.cloud.CloudTaskRunner'}, {'name': 'PREFECT__LOGGING__LOG_TO_CLOUD', 'value': 'true'}, {'name': 'PREFECT__LOGGING__EXTRA_LOGGERS', 'value': '[]'}, {'name': 'PREFECT__CLOUD__GRAPHQL', 'value': '<https://apollo-prefect-server.srv-stage.cloudacademy.xyz/graphql/graphql'}>]\n\nIf the given configuration is desired, deregister the existing\ntaskDefinition and re-run the flow. Alternatively, you can\nchange the family/taskDefinition name in the FargateTaskEnvironment\nfor this flow.")
I can confirm that the task definition in AWS it’s the same as the one in
Given
. The problem seems to be the validation between the elements in
environment
which are in different order. @Spencer does this make sense to you?
s

Spencer

10/07/2020, 1:27 PM
Yep, I was aware that the environment could be read out of order (thanks AWS!); I think I learned that from interacting with ECS rather than through prefect (and after the PR). Wasn't sure if that was the issue but it clearly is.
👍 1
Should be a straight-forward fix; update https://github.com/PrefectHQ/prefect/blob/f554d1313638286d4183b136f97b56bfbfae58db/src/prefect/environments/execution/fargate/fargate_task.py#L255-L256 with something like:
Copy code
given=sorted(value) if key == "environment" else value,
  expected=sorted(existing_container_definition.get(key)) if key == "environment" else value,
a

ale

10/07/2020, 1:35 PM
s

Spencer

10/07/2020, 1:38 PM
yeah, of course, the equality part 😅
a

ale

10/07/2020, 1:40 PM
Yep
s

Spencer

10/07/2020, 1:59 PM
Why not make a PR?
s

Spencer

10/16/2020, 1:25 AM
I came back to this and made another PR: https://github.com/PrefectHQ/prefect/pull/3514
a

ale

10/16/2020, 8:46 AM
From the PR I understand that the fix I made seems to be not working 😅
m

Maikel Penz

11/03/2020, 7:59 AM
^ I can see this out of order failure is still happening btw. @Spencer moved
sorted
to the
givenContainerDefinitions
and
expectedContainerDefinitions
variables but the error log prints them out of order (same as the original behaviour @ale raised)
actually scratch that.. it’s working fine 😬.. wrong version pinned 🤦
👍 1