All of our jobs in our prefect cloud hybrid workpo...
# ask-community
l
All of our jobs in our prefect cloud hybrid workpool are marked as late and not processing and have been all day. our worker is up and our infra that supports jobs initiated by the worker in ECS is up but nothing is moving. I understood there being problems during the AWS outage but since Prefect gave the all clear I expected things would start functioning again and it is not. anyone else seeing this?
ah - we were blocked by our concurrency limit which was hit with pending tasks from this morning
a
Yeah also too many late tasks will also clog up a queue sometimes
Even if the pending ones are cleared out
t
Hi, I am having a similar issue - and I may be confused by @Austin Weisgrau’s response - ?You're saying the queue will remain clogged "even if the pending ones are cleared out"? I have a hybrid workpool that has two queues - a cpu and a gpu queue - that use AWS ECS to launch EC2 instances. My work queue concurrency limit was 128 (my work pool concurrency limit is Unlimited). I submitted 1736 jobs. 1675 came back fine. 61 remained pending. I deleted 3 of them and then resubmitted and those 3 finished fine. So currently, I have 58 Pending jobs in Prefect - they've been Pending since yesterday. No other jobs are in queue or even the work pool. ECS logs do not show any errors but the auto-scalers are not scaling up and no job is running on the running EC2 instances. And the flow run is pending here (attached screen shot). Any help is greatly, greatly appreciated.
a
it can happen that a very clogged queue will remain clogged even with no pending runs, there have been a few reported bugs to that effect recently but may be fixed/resolved by the prefect team already
t
Thanks for the response @Austin Weisgrau! I'll try updating my prefect version and re-submitting just now.
s
is there a way to clear/reset a clogged queue?
a
in theory deleting all stuck "late" runs should do it, but with the bug from earlier this summer you'd have to delete all "pending" runs as well, but that bug might not be recurring anymore
šŸ™Œ 1
t
Just to close out my experience here. I tried two things 1. Define the
task_definition_arn
so Prefect does not create an ECS task and consequently try to create two tasks at the same time. This was something we actually wanted to avoid within our infrastructure for specific reasons. 2. Set
Match Latest Revision In Family
to on. a. For this to work consistently, if I change the task definition for the prefect worker, I run a single job and then Prefect grabs the latest revision in the family consistently it seems without race conditions. Do these pertain to the bug fixed in the summer? I'm not sure. But this seems to be working for us at the moment. I hope the info is helpful here.