Could someone from <@U021J8TU753> clarify when and...
# ask-community
r
Could someone from @Prefect clarify when and how
job_variables
overrides are passed through to ECS containers in a push work pool? I have a push work pool set up to receive custom
cloudwatch_logs_options
from each deployment linked to that work pool (
stream_output
and
configure_cloudwatch_logs
default to True). I’m able to pass through options like below, but flow runs’ logs don’t actually output to the specified log group.
Copy code
"cloudwatch_logs_options": {
    "awslogs-region": "us-east-1",
    "awslogs-logs-group": "test-logs-group",
    "awslogs-create-group": "false",  # already exists
    "awslogs-stream-prefix": "my_deployment_name"
  }
A script I have to pull container definitions at runtime further reveals that no LogsOptions were configured at the container level — unlike previously under our Agent based system. I’ve checked the related IAM role and it has CloudWatchLogsFullAccess enabled so it’s not a permissions issue. The work pool is anyways using credentials that worked under a previous Agent based system. Unfortunately, the lack of push work pool logs makes it impossible to troubleshoot further at the work pool level. I’ve dug around quite a lot in the latest
prefect
and
prefect_aws
code since the docs are sparse on these topics but can’t seem to find where / how work pools pass through job variables and ECS ones in particular. Help would be really appreciated on this topic! I’ve gone well into the weeds but feel like the returns to self-directed inquiry are shrinking.
j
hey, could you share your work pool id? I don't know if it's the issue but
configure_cloudwatch_logs
does need to be explicitly set to True (it does not default to on)
r
I manually set
configure_cloudwatch_logs
and
stream_output
to default to True in the job template
actually please use this work pool :
c93d1948-f85b-4747-8c40-1b61f318e82e
j
thank you! will try and take a look in a bit
r
Thanks!
j
just to confirm some things: • your runs are finishing without error • you're not seeing any log options here on the the ECS task container:
could you share the JSON for
ContainerDefinitions
from your Task Definition revision? e.x.
r
Hey @Jake Kaplan, sorry for dropping the ball on this, our conversation happened right as I was heading out on vacation
I’m looping back to this now
here’s the relevant
ContainerDefinitions
JSON text
Copy code
"containerDefinitions": [
        {
            "name": "prefect",
            "image": "421396523132.dkr.ecr.us-east-1.amazonaws.com/prefect:gridded-etl-dev-latest",
            "cpu": 0,
            "links": [],
            "portMappings": [],
            "essential": true,
            "entryPoint": [],
            "command": [],
            "environment": [],
            "environmentFiles": [],
            "mountPoints": [],
            "volumesFrom": [],
            "dnsServers": [],
            "dnsSearchDomains": [],
            "extraHosts": [],
            "dockerSecurityOptions": [],
            "dockerLabels": {},
            "ulimits": [],
            "systemControls": [],
            "credentialSpecs": []
        }
    ],
One thing I’m experimenting with doing is passing the
PREFECT_LOGGING_EXTRA_LOGGERS
variable as a stream prefix….I was able to get the below working by setting the main task definition as such— meaning no involvement from the work pool or
job_variables
for the deployment
Copy code
"logConfiguration": {
                "logDriver": "awslogs",
                "options": {
                    "awslogs-group": "gridded-etl-logs",
                    "awslogs-region": "us-east-1",
                    "awslogs-stream-prefix": "${PREFECT_LOGGING_EXTRA_LOGGERS}"
                }
            },
Unfortunately it didn’t actually pull in the variable as the stream prefix. It just printed the literal
Copy code
"${PREFECT_LOGGING_EXTRA_LOGGERS}"
as the prefix
Currently my
job_variables
are set up to provide the
PREFECT_LOGGING_EXTRA_LOGGERS
variable, as so
Copy code
{
  "env": {
    "PREFECT_LOGGING_EXTRA_LOGGERS": "cpc_temp_min"
  }
}
My guess is that this environment variable is registered in such a way that the TD can’t access it
ideally I wouldn’t need to take this approach and could specify the relevant parameters in the
job_variables
, but as mentioned in the initial post this isn’t working
@Jake Kaplan I can also confirm that I don’t see any log options on the ECS task container for a successfully completing run, using the JSON configuration I shared
j
Hey! no worries. hope you had a good vacation
You definitely should be able to pass things through
cloudwatch_logs_options
, let me see if I am able to reproduce this. If I'm not I may need to enable some special debug logging and ask you to execute a couple of runs
r
OK, happy to execute some runs if that helps
Watching this closely I’m observing three things: 1. If I specify cloudwatch options as part of the Task Definition, they work. But then I can’t customize per deployment / run. 2. If I specify cloudwatch options as part of the
job_variables
the flow hangs indefinitely until it crashes. 3. If I specify cloudwatch options as part of the Push Work Pool default settings the flow runs but there are no corresponding Log Configuration Options under the corresponding task — and indeed no logs are output to the specific log group. Unfortunately without access to logs from the push work pool it’s hard to see further what’s going on. As a side note, the removal of logs is effectively a regression in capability vs. Agents.
j
hey, sorry for the delayed response! I spent some time digging into this a bit further and I'm not able to exactly recreate what you're seeing. I am able to pass log options via the deployment's
job_variables
and see those populate on the registered task_definition. I think I asked you for this before, but can you confirm if: 1. the task definition has logging configured? if not are you able to share the full JSON for the definition? 2. Is it a prefect registered task definition vs your own? (the revision family name would look like e.x.
prefect__fe745acc-1128-45f4-a4f4-cd1630740d51__16d65ba3-038f-440c-9216-469a6c653565
if it was generated by prefect as opposed to being passed yourself)
r
Hi @Jake Kaplan thanks for digging into this. It sure sounds like we’re reaching some sort of edge case here. 1. The Task Definition I’m testing against does not have logging configured. I’ll DM you the configuration file. a. If I pass a Task Definition with logging configured, the logs will correctly to the specific group and use whatever prefix is set. But I am unable to override these settings via
job_variables
such that I can customize the stream prefix, which is important for logs discovery. 2. These are task definitions we register and revise ourselves. Just to note, the current configuration works just fine on our current Agent-based setup. Digging into the Prefect source code it seems like the Agent populates the log group, stream-prefix, etc. during the entrypoint script. I don’t strictly need to recreate that exact behavior but perhaps it will shed some light on where the breakdown is?
👀 1
j
From what I can see an Agent will pull an already pre-defined task definition, apply configuration and register a fresh definition. Work pools will either read an already defined definition or generate a new one (if a like one does not exist in the task definition family already), but it won't apply your configuration on top of an existing one. I wasn't aware of that difference in behavior, but let me see if I can find out if it's intentional. Either way that at least explains why the log options are not showing up since you're providing your own task definition
r
OK, this is a plausible explanation
I’m fine with registering a new TD if we auto-deregister afterwards. That’s not the biggest hassle.
Hey @Jake Kaplan, any progress on figuring out whether the difference we identified in how Agent vs. Work Pool pull TDs was intentional?
j
hey, sorry for the delayed response here. It does seem to be an intentional design choice. Either you provide your own arn or we'll build one for you, but we won't attempt to mess with your arn and apply values ontop of it.
r
OK. For full awareness, do workers replicate the behavior of Agents, meaning they can modify ARNs? Or are they like push work pools and can’t modify ARNs.
j
Workers and Push pools function under the same set of rules. Just to make sure I understand: You'd like to pass your own ARN as base but then generate a new per deployment definition to pass logging configuration? Are you able to either specify those in your own arn's or let prefect register a definition for you?
r
The key thing is passing a different
awslogs-stream-prefix
per flow run. This vastly improves logs discovery when things go wrong. In practice this means letting prefect register a definition since it’s linked to a prefect triggered action (a flow run). Otherwise I can hard code the prefix (and log group) in the TD but then it’s static. I tried referencing the
PREFECT_LOGGIN_EXTRA_LOGGERS
env. variable in the TD to apply a stream-prefix that corresponds to the flow run in some capacity — but I think that’s registered after the TD is applied so it can’t pick it up.
j
Ahhh! Okay so by default the
awslogs-stream-prefix
should default to the name of the flow run if you don't pass anything extra. I believe that is the same behavior to using agents with ECS infra
r
That was the case
But I’m not seeing that behavior now if I leave all log configs empty on the TD or leave just
awslogs-tream-prefix
blank
I’ve seen those lines you shared — they implicitly depend on modifying the TD via the Agent/Worker
unless I’m missing something?
j
hm okay. If you don't pass your own ARN and you leaving logging config options blank can you show me the log options you see on the task definition?
r
Well as currently set up I have to pass my own ARN but I can leave logging config options blank in that ARN — is that OK? Or do I need to have the TD completely provided by the push work pool to replicate this behavior?
as currently set up on my end*
j
The default behavior if your turn logging on and have prefect registering ARN's for you is to set
awslogs-stream-prefix
to the name of the flow run (
benevolent-pigeon
)
if you do pass your own arn, you'll have to set it yourself. I am not positive if there is way to dynamically specify the value in AWS like you were trying to do before with an env var, I'd have to look
r
OK
I’m going to try recreating the ARNs we currently store in AWS from scratch in the defaults of the work pool
So that Prefect can register them on the fly and the desired logging behavior can kick in
This will require hard coding in a lot bits and bobs, e.g.
Copy code
"networkMode": "awsvpc"
that currently are handled by the TD. It seems the job template provided by the workpool is not an exact 1-to-1 match with task definitions in terms of where fields are nested, so if you have any examples of a job template providing all these things it would be a big help
After digging in and considering my options I think the easiest solution is hard coding all log configuration details in the task definition and relying on that. The lack of intelligible log names is a minor tradeoff for greatly reduced code complexity, since it would take many lines to recreate these task definitions on the fly and tbh the mapping between the jinja ECS work pool template and AWS task definition templates is not very clear and sometimes behaves unpredictably (refusing to accept hard coded values). Thanks for your help on this @Jake Kaplan.
j
Understood, that makes sense! And no problem, I'm sorry we weren't able to get a perfect solution but I'm glad you have a path forward, good luck with rest of your migration!
r
No worries, not every path ends in a rainbow / pot of gold / choose your metaphor. I really appreciate your assistance, I would have been lost without it!
💙 1