I am noticing issues with ECS and our push work po...
# ask-community
c
I am noticing issues with ECS and our push work pools as well surrounding this with sudden high CPU usage. Is this still happening today?
All of our jobs have suddenly went into LATE status as of 8:00AM MTN time today.
j
hey, nothing that I am aware of right now. Can you share workspace id and work pool id?
c
Workspace ID f375b1eb-767e-43c6-abe2-3c1d1043706a Workpool ID: 18784370-635d-49e9-b1cb-d048f829dadf
j
Just to make sure I have the right one, that work pool does not appear to be a push pool
I do see heartbeats from that worker stop a few hours ago? Is that worker still running? If it's stopped for some reason that would explain the late flow runs as there is nothing to pick them up
c
It appears to still be running, but the tasks are failing
j
Sorry, when you say tasks you mean there are errors on the worker itself? Or prefect tasks?
c
Correct, in ECS it seems we are getting stop errors for some reason but the tasks are populating. It just canโ€™t grab the image it seems?
j
Gotcha. Can at least definitely confirm theres no systemic issue with Prefect Cloud. It sounds like it's an issue between ECS <--> and your image repository possibly? Assuming your image tag is valid, I've seen transient network issues before or things like dockerhub rate limits? (not sure if you're using dockerhub). I may not be able to help so much here, but happy to take a look if you have an error to share?
c
This was mainly just to ensure it is an us problem, Iโ€™ll need to continue investigating on our end.
๐Ÿ‘ 1
Thanks for your help!
j
no problem, sounds good! best of luck ๐Ÿคž
c
It seems the orchestrator, the work pool manager is failing for us.
Here is the log
j
This is from the worker? it's a little hard to read the log to understand what the actual error is/where it's occurring
c
Ha, yeah its just as confusing to us
๐Ÿ˜… 1
So a little context.
j
are you able to get a traceback?
c
There is only one task in ecs that keeps rotating, it starts and dies every threeish minutes
j
If this started happening out of "nowhere" it's possible it's a dependency issue? are all your packages pinned?
c
All of our packages are pinned yes
Its something with pydantic
from prefect specifically I believe
once it hits that point in the log, error traces start to show up
j
a little hard to tell where in the worker process that is, but is that just after running
prefect worker start ...
?
c
It was due to the prefect image
we upgraded to latest and that fixed it
prefect-recipes
was also archived a month ago?
j
oh great! sounds like maybe some sort of conflict versions? I saw in that log it looked like you were installing the latest prefect I thought
c
Yeah, 3.0.1 -> 3.2.9 and that fixed it
๐Ÿ”ฅ 2
but like.. it was pinned lol. I guess I am confused how that would happen
Why would there be dependency changes on a pinned package from months ago
j
3.2.9
came out today and I saw that was getting pip installed in your logs
b
Hey Chandler! The newest home for prefect examples can be found here.
๐Ÿ‘€ 1