Hi All, Any AWS + ECS experts here who have figur...
# ask-community
s
Hi All, Any AWS + ECS experts here who have figured out how to monitor the CPU + memory utilization of
ECSRun
flows that execute on Fargate? I've noticed a couple of times that my ECS task will run out of memory and kill my flow, but when I go to the ECS (or Cloudwatch) console, there's no way for me to actually examine those metrics for the individual ECS task that was kicked off via my
ECSAgent
. From my digging around, it seems like it might have something to do with the fact that it's not actually the ECS service that my ECS task is tied to that's "responsible" for launching the ECS task (as is evidenced by the fact that the ECS dashboard says it's currently running 0 tasks even though the flow and the underlying ECS task are clearly running), but I'm not sure. I'm just trying to get better insight into the actual memory/CPU requirements of my flow, without having to, say, briefly move its execution to EC2, monitor it there, and then move it back to ECS Fargate... Many thanks in advance for any tips!
a
Sorry to hear that and I can confirm that something is not working properly when setting custom CPU/memory overrides on ECSRun - I opened an issue here https://github.com/PrefectHQ/prefect/issues/5550
s
oh I actually am not even passing custom memory/CPU requirements to my task via Prefect right now, we actually define our ECS tasks outside of the context of Prefect (via Terraform), make sure the launch type of the associated service is Fargate, and then just supply the task definition ARN to Prefect; the issue i'm having is that when I go to AWS Console > ECS >
prefect-cluster
>
ecs-service-for-flow-x
> Metrics, there's nothing at all that shows up in those graphs
but it's definitely good to know that this is an open issue 😎
a
what do you mean by ECS service?
Prefect deploys flow runs as standalone ECS tasks that run once and then finish - sort of like Kubernetes jobs i.e. batch jobs. ECS service is more if you deploy an API and want it to run 24/7 - this is something you could use for your ECS agent to ensure that one ECS task with the Prefect agent container is running reliably at all times, but flow runs are deployed completely separately from that.
πŸ‘ 1
s
yeah I'm realizing that now, I configured much of this long before I understood how to properly use ECS – I should just nuke these ECS services that I have defined in my cluster; but even still, if those go away I still don't think I'd be able to see the memory/CPU utilization of the tasks anywhere
a
If you can’t see that then something is not working right for sure. You should be able to first navigate to your stopped ECS tasks (stopped means that the flow run finished - same whether the run was successful or not) and from there to the task definition of that ECS task
@Sean Talia Last year I wrote a blog post about setting up ECS agent here and we also have this Terraform recipe for setting up ECS agent - sharing in case this might be useful
s
yes I remember reading that! it's on our todo list of moving our ECS agent to an ECS service; right now we're running it on an EC2 instance but we have a ticket to move it
and yeah I can definitely see the memory/CPU that's been allocated for the fargate task, I just was hoping to be able to see what's actually being consumed
a
oh I see. Maybe you can add some extra code and log messages to track it? e.g.:
Copy code
import psutil
print('The CPU usage is: ', psutil.cpu_percent(4))
print('RAM memory % used:', psutil.virtual_memory()[2])
you could add it in the beginning and after doing some processing - maybe having one task that you call multiple times in your flow to have this extra data in your flow?
s
cool! i didn't know about this library...I'll have to play around with it, but I have to say I'm a little disappointed AWS doesn't just make this info available for fargate tasks out of the box, I was hoping to just get it all for free for all our flows that we're running as ECS tasks... this is all super helpful, thank you @Anna Geller !
πŸ‘ 1