https://prefect.io logo
g

George Coyne

08/05/2020, 8:53 PM
A number of flows in prefect cloud are stuck in submitted and zombie killer is not putting them to bed, any suggestions on getting them processed?
n

nicholas

08/05/2020, 9:07 PM
Hi @George Coyne - that doesn't sound out of the ordinary, zombie killer is only responsible for dealing with tasks that are running that haven't issued a heartbeat in the last 2 minutes. For flows stuck in a submitted state, the lazarus process is responsible for re-scheduling those every ~6 minutes or so, to a maximum of 10 reschedules
g

George Coyne

08/05/2020, 9:07 PM
Ah true
OK manually going through and setting state to scheduled
👍 1
I have tasks queuing in submitted, I recently updated the agent, other than that nothing has changed in our cluster.
d

Dylan

08/05/2020, 10:39 PM
Hey @George Coyne what’s your new agent version?
g

George Coyne

08/05/2020, 10:47 PM
0.12.6
d

Dylan

08/05/2020, 10:52 PM
We’re checking one thing on our end, just a moment
In the meantime, anything look funky in your agent logs?
Flow Run logs also clear? You’re running on kubernetes right? Are jobs being created properly?
Your work queue looks like it’s functioning properly to us
g

George Coyne

08/05/2020, 11:51 PM
Flow run logs are good, running on k8s, it just looks like some jobs aren’t actually queueing or something? I’m digging into it a little deeper but wanted to bring it to you guys sooner rather than later
If I manually set these submitted flows back to schedule they get picked up
d

Dylan

08/06/2020, 12:29 AM
Hey @George Coyne just to double check, you’re saying that the first time the run is submitted the agent doesn’t create a k8s job but the second time the run goes into a submitted state the agent properly creates a job?
g

George Coyne

08/06/2020, 12:54 AM
Inconsistently but yes, some flows get pulled perfectly, some just don’t. I can’t determine any consistency to the behavior.
d

Dylan

08/06/2020, 1:42 AM
In the logs for the flow run, there should be a job ID for the created job. Take a look at the jobs created with those IDs. What state are the jobs in? Are they “Unschedulable”?
g

George Coyne

08/06/2020, 4:37 PM
That’s the thing, the jobs just don’t appear
Checking agent logs
d

Dylan

08/06/2020, 4:37 PM
🧐
g

George Coyne

08/06/2020, 4:39 PM
prefect-job-8ffbfe27
for instance, stuck in submitted since 11:00 cdt
Does not appear in agent logs, does not appear in kube jobs
d

Dylan

08/06/2020, 4:40 PM
Would you shoot me that flow run ID?
g

George Coyne

08/06/2020, 4:42 PM
3c45f67b-ffe9-4f9c-b0fc-da48b4a4f31d
d

Dylan

08/06/2020, 5:01 PM
Is that flow run present in your agent logs?
Crazy question: do you have a second agent running?
j

josh

08/06/2020, 5:07 PM
Also one thing to confirm when checking for the job:
Copy code
kubectl get jobs --all-namespaces
The fact that it’s stuck in submitted with a job ID attached means the agent is submitting it to k8s 🤔
g

George Coyne

08/06/2020, 5:24 PM
I do have a second agent running
I spun up a 11.5 agent this morning
d

Dylan

08/06/2020, 5:25 PM
ah, so it’s not picking up runs accidentally?
g

George Coyne

08/06/2020, 5:25 PM
No jobs of in either cluster as of right now
and I checked both agent logs for the flow mentioned above
Will the second agent cause an issue?
d

Dylan

08/06/2020, 5:25 PM
Both agents will pick up flows unless you add labels to both agents and flows
if this is what you’re looking to do, then that’s totally fine
g

George Coyne

08/06/2020, 5:27 PM
Yep I love that too, but in this situation both are running without a label
d

Dylan

08/06/2020, 5:27 PM
Gotcha, so you’re just trying to get an agent to submit a flow run?
Are the agents in the same cluster?
g

George Coyne

08/06/2020, 5:27 PM
Agents are in seperate clusters
d

Dylan

08/06/2020, 5:28 PM
Is the new agent performing better?
g

George Coyne

08/06/2020, 5:28 PM
I am trying to understand, and stop, flows from getting submitted without running
d

Dylan

08/06/2020, 5:28 PM
Understood
Is the new agent accomplishing that goal?
g

George Coyne

08/06/2020, 5:32 PM
Nope
I was blindly hoping that it would catch whatever was being skipped/missed
I’ll try removing the 12.6 agent