Hey team, somehow today our agent is not picking u...
# ask-community
k
Hey team, somehow today our agent is not picking up any flow runs to run. The last run happened around 24 hours ago, and since then it's gone quiet, I restarted it and it just sits there reporting
Waiting for flow runs...
We are cloud users, and we don't think we touched anything at all since it was last healthy (it was a bank holiday here yesterday in fact)
I see this, did something change around labels?
j
Hi @Kostas Chalikias - looks like there's a mismatch between the labels on your agent and the labels on your flow/flow runs. Can you check if there were any changes?
k
pretty sure there weren't, we noticed this in the release notes
but we haven't updated our prefect-core recently or anything like that (we use local agent)
j
Ok. Can you check what labels are on your agent and what labels are on one of the flow runs you want it to pick up?
There's a brief explanation of labels here but basically agent labels need to be a superset of your flow run labels.
k
will check & revert
@Freddie
šŸ‘ 1
f
Hi @Jenny, I've started an agent up locally to try and test and debug things. It's got labels
staging
(corresponding to our environment) and my hostname (from
<flow_name>.storage.labels
I believe) amongst a list of all the names of all the flows we have. The flow I'm trying to run has just the two labels (
staging
and my hostname). Nothing gets sent to the agent though when I hit run from the UI. It just says waiting for flow runs. I'm starting to suspect this isn't just labels. Wondered if you had any thoughts
j
Hmmm.... that sounds correct. Let me see what else what might be the issue.
f
Thanks Jenny
j
Hi @Freddie - a few questions to see if we can figure this out: • Is this happening with all your agents? Or just one of them? • Same q with flows - are all affected? Or just one? • What version of core are you running? Is the agent running with the same version? Thanks!
f
We've been running the latest 0.13.x and indeed still are on our production stack. We've deployed 0.14.x locally and on our staging instance. This is happening across all agents - we have one per stack, and then I'm running one locally too which mirrors staging.
We've been seeing this across different flows across both of our environments / stacks.
k
@Jenny any other thoughts on this? Being without our scheduler is obviously not ideal
j
Hi @Kostas Chalikias - Looking into it right now to see if we can figure out what's going on.
šŸ‘ 2
Sorry some more questions to make sure I'm not missing anything: • are you are able to run a flow run using the run page/quick run button? • have you used the start now button at all?
f
Hi @Jenny - we've been triggering runs using the quick run and run page and they've been scheduling fine, but they never start, even with pressing the start now button. We just get an additional scheduled line on the timeline.
j
OK - I think I know the issue. Do you by any chance have more than 625 late runs? Our work queue pauses over that number and so no new runs will get picked up.
f
We've had a lot of late runs recently
j
Can you turn off the schedule on some of your flows to clear the number of late runs? (They can be toggled back on afterwards.)
f
I'll check our stack - we've both been clearing them relatively frequently
Across all instances we did have more than 625
I'm just clearing them all now
j
Thanks! Once they are cleared I'm hopeful that your new runs should get picked up. If they don't, please let me know and we'll keep investigating!
f
A couple have started running
Whoop šŸŽ‰ Thank you @Jenny
j
Fantastic! Thanks for the update and your patience!
f
We'll have to keep an eye on things going forward and look at why we got to this level of late runs
I'm going to get it all back into a good place now and I'll let you know if there's anything else that seems a bit off šŸ™‚
j
Sounds good. Thank you!
k
Nice, thanks!