A number of flows in prefect cloud are stuck in su...
# prefect-community
g
A number of flows in prefect cloud are stuck in submitted and zombie killer is not putting them to bed, any suggestions on getting them processed?
n
Hi @George Coyne - that doesn't sound out of the ordinary, zombie killer is only responsible for dealing with tasks that are running that haven't issued a heartbeat in the last 2 minutes. For flows stuck in a submitted state, the lazarus process is responsible for re-scheduling those every ~6 minutes or so, to a maximum of 10 reschedules
g
Ah true
OK manually going through and setting state to scheduled
👍 1
I have tasks queuing in submitted, I recently updated the agent, other than that nothing has changed in our cluster.
d
Hey @George Coyne what’s your new agent version?
g
0.12.6
d
We’re checking one thing on our end, just a moment
In the meantime, anything look funky in your agent logs?
Flow Run logs also clear? You’re running on kubernetes right? Are jobs being created properly?
Your work queue looks like it’s functioning properly to us
g
Flow run logs are good, running on k8s, it just looks like some jobs aren’t actually queueing or something? I’m digging into it a little deeper but wanted to bring it to you guys sooner rather than later
If I manually set these submitted flows back to schedule they get picked up
d
Hey @George Coyne just to double check, you’re saying that the first time the run is submitted the agent doesn’t create a k8s job but the second time the run goes into a submitted state the agent properly creates a job?
g
Inconsistently but yes, some flows get pulled perfectly, some just don’t. I can’t determine any consistency to the behavior.
d
In the logs for the flow run, there should be a job ID for the created job. Take a look at the jobs created with those IDs. What state are the jobs in? Are they “Unschedulable”?
g
That’s the thing, the jobs just don’t appear
Checking agent logs
d
🧐
g
prefect-job-8ffbfe27
for instance, stuck in submitted since 11:00 cdt
Does not appear in agent logs, does not appear in kube jobs
d
Would you shoot me that flow run ID?
g
3c45f67b-ffe9-4f9c-b0fc-da48b4a4f31d
d
Is that flow run present in your agent logs?
Crazy question: do you have a second agent running?
j
Also one thing to confirm when checking for the job:
Copy code
kubectl get jobs --all-namespaces
The fact that it’s stuck in submitted with a job ID attached means the agent is submitting it to k8s 🤔
g
I do have a second agent running
I spun up a 11.5 agent this morning
d
ah, so it’s not picking up runs accidentally?
g
No jobs of in either cluster as of right now
and I checked both agent logs for the flow mentioned above
Will the second agent cause an issue?
d
Both agents will pick up flows unless you add labels to both agents and flows
if this is what you’re looking to do, then that’s totally fine
g
Yep I love that too, but in this situation both are running without a label
d
Gotcha, so you’re just trying to get an agent to submit a flow run?
Are the agents in the same cluster?
g
Agents are in seperate clusters
d
Is the new agent performing better?
g
I am trying to understand, and stop, flows from getting submitted without running
d
Understood
Is the new agent accomplishing that goal?
g
Nope
I was blindly hoping that it would catch whatever was being skipped/missed
I’ll try removing the 12.6 agent