Could someone from < Prefect> clarify when and how `job vari Prefect Community #ask-community

Could someone from <@U021J8TU753> clarify when and...

Robert Banick

04/05/2024, 4:35 PM

Could someone from @Prefect clarify when and how

job_variables

overrides are passed through to ECS containers in a push work pool? I have a push work pool set up to receive custom

cloudwatch_logs_options

from each deployment linked to that work pool (

stream_output

and

configure_cloudwatch_logs

default to True). I’m able to pass through options like below, but flow runs’ logs don’t actually output to the specified log group.

Copy code

"cloudwatch_logs_options": {
    "awslogs-region": "us-east-1",
    "awslogs-logs-group": "test-logs-group",
    "awslogs-create-group": "false",  # already exists
    "awslogs-stream-prefix": "my_deployment_name"
  }

A script I have to pull container definitions at runtime further reveals that no LogsOptions were configured at the container level — unlike previously under our Agent based system. I’ve checked the related IAM role and it has CloudWatchLogsFullAccess enabled so it’s not a permissions issue. The work pool is anyways using credentials that worked under a previous Agent based system. Unfortunately, the lack of push work pool logs makes it impossible to troubleshoot further at the work pool level. I’ve dug around quite a lot in the latest

prefect

and

prefect_aws

code since the docs are sparse on these topics but can’t seem to find where / how work pools pass through job variables and ECS ones in particular. Help would be really appreciated on this topic! I’ve gone well into the weeds but feel like the returns to self-directed inquiry are shrinking.

Jake Kaplan

04/05/2024, 4:49 PM

hey, could you share your work pool id? I don't know if it's the issue but

configure_cloudwatch_logs

does need to be explicitly set to True (it does not default to on)

Robert Banick

04/05/2024, 4:59 PM

I manually set

configure_cloudwatch_logs

and

stream_output

to default to True in the job template

Robert Banick

04/05/2024, 5:01 PM

actually please use this work pool :

c93d1948-f85b-4747-8c40-1b61f318e82e

Jake Kaplan

04/05/2024, 6:06 PM

thank you! will try and take a look in a bit

Robert Banick

04/05/2024, 6:55 PM

Thanks!

Jake Kaplan

04/05/2024, 8:59 PM

just to confirm some things: • your runs are finishing without error • you're not seeing any log options here on the the ECS task container:

Jake Kaplan

04/05/2024, 9:01 PM

could you share the JSON for

ContainerDefinitions

from your Task Definition revision? e.x.

Jake Kaplan

04/05/2024, 9:02 PM

Robert Banick

04/15/2024, 4:22 PM

Hey @Jake Kaplan, sorry for dropping the ball on this, our conversation happened right as I was heading out on vacation

Robert Banick

04/15/2024, 4:22 PM

I’m looping back to this now

Robert Banick

04/15/2024, 4:28 PM

here’s the relevant

ContainerDefinitions

JSON text

Copy code

"containerDefinitions": [
        {
            "name": "prefect",
            "image": "421396523132.dkr.ecr.us-east-1.amazonaws.com/prefect:gridded-etl-dev-latest",
            "cpu": 0,
            "links": [],
            "portMappings": [],
            "essential": true,
            "entryPoint": [],
            "command": [],
            "environment": [],
            "environmentFiles": [],
            "mountPoints": [],
            "volumesFrom": [],
            "dnsServers": [],
            "dnsSearchDomains": [],
            "extraHosts": [],
            "dockerSecurityOptions": [],
            "dockerLabels": {},
            "ulimits": [],
            "systemControls": [],
            "credentialSpecs": []
        }
    ],

Robert Banick

04/15/2024, 4:44 PM

One thing I’m experimenting with doing is passing the

PREFECT_LOGGING_EXTRA_LOGGERS

variable as a stream prefix….I was able to get the below working by setting the main task definition as such— meaning no involvement from the work pool or

job_variables

for the deployment

Copy code

"logConfiguration": {
                "logDriver": "awslogs",
                "options": {
                    "awslogs-group": "gridded-etl-logs",
                    "awslogs-region": "us-east-1",
                    "awslogs-stream-prefix": "${PREFECT_LOGGING_EXTRA_LOGGERS}"
                }
            },

Unfortunately it didn’t actually pull in the variable as the stream prefix. It just printed the literal

Copy code

"${PREFECT_LOGGING_EXTRA_LOGGERS}"

as the prefix

Robert Banick

04/15/2024, 5:40 PM

Currently my

job_variables

are set up to provide the

PREFECT_LOGGING_EXTRA_LOGGERS

variable, as so

Copy code

{
  "env": {
    "PREFECT_LOGGING_EXTRA_LOGGERS": "cpc_temp_min"
  }
}

Robert Banick

04/15/2024, 5:40 PM

My guess is that this environment variable is registered in such a way that the TD can’t access it

Robert Banick

04/15/2024, 5:42 PM

ideally I wouldn’t need to take this approach and could specify the relevant parameters in the

job_variables

, but as mentioned in the initial post this isn’t working

Robert Banick

04/15/2024, 6:23 PM

@Jake Kaplan I can also confirm that I don’t see any log options on the ECS task container for a successfully completing run, using the JSON configuration I shared

Jake Kaplan

04/15/2024, 6:30 PM

Hey! no worries. hope you had a good vacation

Jake Kaplan

04/15/2024, 6:32 PM

You definitely should be able to pass things through

cloudwatch_logs_options

, let me see if I am able to reproduce this. If I'm not I may need to enable some special debug logging and ask you to execute a couple of runs

Robert Banick

04/15/2024, 6:32 PM

OK, happy to execute some runs if that helps

Robert Banick

04/15/2024, 8:43 PM

Watching this closely I’m observing three things: 1. If I specify cloudwatch options as part of the Task Definition, they work. But then I can’t customize per deployment / run. 2. If I specify cloudwatch options as part of the

job_variables

the flow hangs indefinitely until it crashes. 3. If I specify cloudwatch options as part of the Push Work Pool default settings the flow runs but there are no corresponding Log Configuration Options under the corresponding task — and indeed no logs are output to the specific log group. Unfortunately without access to logs from the push work pool it’s hard to see further what’s going on. As a side note, the removal of logs is effectively a regression in capability vs. Agents.

Jake Kaplan

04/16/2024, 2:03 PM

hey, sorry for the delayed response! I spent some time digging into this a bit further and I'm not able to exactly recreate what you're seeing. I am able to pass log options via the deployment's

job_variables

and see those populate on the registered task_definition. I think I asked you for this before, but can you confirm if: 1. the task definition has logging configured? if not are you able to share the full JSON for the definition? 2. Is it a prefect registered task definition vs your own? (the revision family name would look like e.x.

prefect__fe745acc-1128-45f4-a4f4-cd1630740d51__16d65ba3-038f-440c-9216-469a6c653565

if it was generated by prefect as opposed to being passed yourself)

Robert Banick

04/16/2024, 4:19 PM

Hi @Jake Kaplan thanks for digging into this. It sure sounds like we’re reaching some sort of edge case here. 1. The Task Definition I’m testing against does not have logging configured. I’ll DM you the configuration file. a. If I pass a Task Definition with logging configured, the logs will correctly to the specific group and use whatever prefix is set. But I am unable to override these settings via

job_variables

such that I can customize the stream prefix, which is important for logs discovery. 2. These are task definitions we register and revise ourselves. Just to note, the current configuration works just fine on our current Agent-based setup. Digging into the Prefect source code it seems like the Agent populates the log group, stream-prefix, etc. during the entrypoint script. I don’t strictly need to recreate that exact behavior but perhaps it will shed some light on where the breakdown is?

👀 1

Jake Kaplan

04/16/2024, 6:39 PM

From what I can see an Agent will pull an already pre-defined task definition, apply configuration and register a fresh definition. Work pools will either read an already defined definition or generate a new one (if a like one does not exist in the task definition family already), but it won't apply your configuration on top of an existing one. I wasn't aware of that difference in behavior, but let me see if I can find out if it's intentional. Either way that at least explains why the log options are not showing up since you're providing your own task definition

Robert Banick

04/16/2024, 6:41 PM

OK, this is a plausible explanation

Robert Banick

04/16/2024, 6:41 PM

I’m fine with registering a new TD if we auto-deregister afterwards. That’s not the biggest hassle.

Robert Banick

04/17/2024, 8:04 PM

Hey @Jake Kaplan, any progress on figuring out whether the difference we identified in how Agent vs. Work Pool pull TDs was intentional?

Jake Kaplan

04/18/2024, 1:27 PM

hey, sorry for the delayed response here. It does seem to be an intentional design choice. Either you provide your own arn or we'll build one for you, but we won't attempt to mess with your arn and apply values ontop of it.

Robert Banick

04/18/2024, 3:57 PM

OK. For full awareness, do workers replicate the behavior of Agents, meaning they can modify ARNs? Or are they like push work pools and can’t modify ARNs.

Jake Kaplan

04/18/2024, 4:12 PM

Workers and Push pools function under the same set of rules. Just to make sure I understand: You'd like to pass your own ARN as base but then generate a new per deployment definition to pass logging configuration? Are you able to either specify those in your own arn's or let prefect register a definition for you?

Robert Banick

04/18/2024, 4:23 PM

The key thing is passing a different

awslogs-stream-prefix

per flow run. This vastly improves logs discovery when things go wrong. In practice this means letting prefect register a definition since it’s linked to a prefect triggered action (a flow run). Otherwise I can hard code the prefix (and log group) in the TD but then it’s static. I tried referencing the

PREFECT_LOGGIN_EXTRA_LOGGERS

env. variable in the TD to apply a stream-prefix that corresponds to the flow run in some capacity — but I think that’s registered after the TD is applied so it can’t pick it up.

Jake Kaplan

04/18/2024, 4:29 PM

Ahhh! Okay so by default the

awslogs-stream-prefix

should default to the name of the flow run if you don't pass anything extra. I believe that is the same behavior to using agents with ECS infra

Jake Kaplan

04/18/2024, 4:30 PM

https://github.com/PrefectHQ/prefect-aws/blob/main/prefect_aws/ecs.py#L1402

Robert Banick

04/18/2024, 4:31 PM

That was the case

Robert Banick

04/18/2024, 4:31 PM

But I’m not seeing that behavior now if I leave all log configs empty on the TD or leave just

awslogs-tream-prefix

blank

Robert Banick

04/18/2024, 4:32 PM

I’ve seen those lines you shared — they implicitly depend on modifying the TD via the Agent/Worker

Robert Banick

04/18/2024, 4:32 PM

unless I’m missing something?

Jake Kaplan

04/18/2024, 4:33 PM

hm okay. If you don't pass your own ARN and you leaving logging config options blank can you show me the log options you see on the task definition?

Robert Banick

04/18/2024, 4:38 PM

Well as currently set up I have to pass my own ARN but I can leave logging config options blank in that ARN — is that OK? Or do I need to have the TD completely provided by the push work pool to replicate this behavior?

Robert Banick

04/18/2024, 4:38 PM

as currently set up on my end*

Jake Kaplan

04/18/2024, 4:41 PM

The default behavior if your turn logging on and have prefect registering ARN's for you is to set

awslogs-stream-prefix

to the name of the flow run (

benevolent-pigeon

)

Jake Kaplan

04/18/2024, 4:42 PM

if you do pass your own arn, you'll have to set it yourself. I am not positive if there is way to dynamically specify the value in AWS like you were trying to do before with an env var, I'd have to look

Robert Banick

04/18/2024, 4:44 PM

Robert Banick

04/18/2024, 4:45 PM

I’m going to try recreating the ARNs we currently store in AWS from scratch in the defaults of the work pool

Robert Banick

04/18/2024, 4:45 PM

So that Prefect can register them on the fly and the desired logging behavior can kick in

Robert Banick

04/18/2024, 4:49 PM

This will require hard coding in a lot bits and bobs, e.g.

Copy code

"networkMode": "awsvpc"

that currently are handled by the TD. It seems the job template provided by the workpool is not an exact 1-to-1 match with task definitions in terms of where fields are nested, so if you have any examples of a job template providing all these things it would be a big help

Robert Banick

04/18/2024, 7:45 PM

After digging in and considering my options I think the easiest solution is hard coding all log configuration details in the task definition and relying on that. The lack of intelligible log names is a minor tradeoff for greatly reduced code complexity, since it would take many lines to recreate these task definitions on the fly and tbh the mapping between the jinja ECS work pool template and AWS task definition templates is not very clear and sometimes behaves unpredictably (refusing to accept hard coded values). Thanks for your help on this @Jake Kaplan.

Jake Kaplan

04/18/2024, 9:18 PM

Understood, that makes sense! And no problem, I'm sorry we weren't able to get a perfect solution but I'm glad you have a path forward, good luck with rest of your migration!

Robert Banick

04/18/2024, 9:20 PM

No worries, not every path ends in a rainbow / pot of gold / choose your metaphor. I really appreciate your assistance, I would have been lost without it!

💙 1

6 Views

Open in Slack

Previous Next