Hi All Any AWS + ECS experts here who have figured out how t Prefect Community #ask-community

Hi All, Any AWS + ECS experts here who have figur...

Sean Talia

03/22/2022, 7:47 PM

Hi All, Any AWS + ECS experts here who have figured out how to monitor the CPU + memory utilization of

ECSRun

flows that execute on Fargate? I've noticed a couple of times that my ECS task will run out of memory and kill my flow, but when I go to the ECS (or Cloudwatch) console, there's no way for me to actually examine those metrics for the individual ECS task that was kicked off via my

ECSAgent

. From my digging around, it seems like it might have something to do with the fact that it's not actually the ECS service that my ECS task is tied to that's "responsible" for launching the ECS task (as is evidenced by the fact that the ECS dashboard says it's currently running 0 tasks even though the flow and the underlying ECS task are clearly running), but I'm not sure. I'm just trying to get better insight into the actual memory/CPU requirements of my flow, without having to, say, briefly move its execution to EC2, monitor it there, and then move it back to ECS Fargate... Many thanks in advance for any tips!

Anna Geller

03/22/2022, 7:49 PM

Sorry to hear that and I can confirm that something is not working properly when setting custom CPU/memory overrides on ECSRun - I opened an issue here https://github.com/PrefectHQ/prefect/issues/5550

Sean Talia

03/22/2022, 8:24 PM

oh I actually am not even passing custom memory/CPU requirements to my task via Prefect right now, we actually define our ECS tasks outside of the context of Prefect (via Terraform), make sure the launch type of the associated service is Fargate, and then just supply the task definition ARN to Prefect; the issue i'm having is that when I go to AWS Console > ECS >

prefect-cluster

ecs-service-for-flow-x

> Metrics, there's nothing at all that shows up in those graphs

Sean Talia

03/22/2022, 8:24 PM

but it's definitely good to know that this is an open issue 😎

Anna Geller

03/22/2022, 8:29 PM

what do you mean by ECS service?

Anna Geller

03/22/2022, 8:32 PM

Prefect deploys flow runs as standalone ECS tasks that run once and then finish - sort of like Kubernetes jobs i.e. batch jobs. ECS service is more if you deploy an API and want it to run 24/7 - this is something you could use for your ECS agent to ensure that one ECS task with the Prefect agent container is running reliably at all times, but flow runs are deployed completely separately from that.

👍 1

Sean Talia

03/22/2022, 8:35 PM

yeah I'm realizing that now, I configured much of this long before I understood how to properly use ECS – I should just nuke these ECS services that I have defined in my cluster; but even still, if those go away I still don't think I'd be able to see the memory/CPU utilization of the tasks anywhere

Anna Geller

03/22/2022, 9:43 PM

If you can’t see that then something is not working right for sure. You should be able to first navigate to your stopped ECS tasks (stopped means that the flow run finished - same whether the run was successful or not) and from there to the task definition of that ECS task

Anna Geller

03/22/2022, 9:52 PM

@Sean Talia Last year I wrote a blog post about setting up ECS agent here and we also have this Terraform recipe for setting up ECS agent - sharing in case this might be useful

Sean Talia

03/22/2022, 11:36 PM

yes I remember reading that! it's on our todo list of moving our ECS agent to an ECS service; right now we're running it on an EC2 instance but we have a ticket to move it

Sean Talia

03/22/2022, 11:37 PM

and yeah I can definitely see the memory/CPU that's been allocated for the fargate task, I just was hoping to be able to see what's actually being consumed

Anna Geller

03/23/2022, 12:22 AM

oh I see. Maybe you can add some extra code and log messages to track it? e.g.:

Copy code

import psutil
print('The CPU usage is: ', psutil.cpu_percent(4))
print('RAM memory % used:', psutil.virtual_memory()[2])

you could add it in the beginning and after doing some processing - maybe having one task that you call multiple times in your flow to have this extra data in your flow?

Sean Talia

03/23/2022, 2:18 PM

cool! i didn't know about this library...I'll have to play around with it, but I have to say I'm a little disappointed AWS doesn't just make this info available for fargate tasks out of the box, I was hoping to just get it all for free for all our flows that we're running as ECS tasks... this is all super helpful, thank you @Anna Geller !

👍 1

7 Views

Open in Slack

Previous Next