https://prefect.io logo
r

redsquare

11/10/2022, 10:01 AM
Can we get a /health + /stats endpoint on an agent so we can monitor and cycle if needed
c

Christopher Boyd

11/10/2022, 3:04 PM
HI redsquare - is there a specific case you are seeing this being necessary?
Not saying that it’s not a bad idea, but I’m curious if there is another problem occurring that you feel this is missing
the agent natively is not running any sort of webserver, and has no listening connections as it is outbound only
however, you can add a liveness probe that tests if the prefect agent is still responsive, with something like this:
Copy code
livenessProbe:
            exec:
              command:
              - prefect
              - agent
              - '--help'
            initialDelaySeconds: 5
            periodSeconds: 5
r

redsquare

11/10/2022, 3:50 PM
hey @Christopher Boyd we had network blips on the agent overnight and 2 flows left in a pending state as a result
Do pending jobs ever recover if the agent couldnt kick the job off
do they eventually get set to failed
c

Christopher Boyd

11/10/2022, 4:04 PM
You mean pending kubernetes jobs?
or Pending prefect flow runs?
if it’s a kubernetes job that was started, taht wouldnt’ be a prefect issue at that point, it was handed off to execute to k8s and I would be looking at the logs there for why it’s stuck in pending (insufficient cpu/memory potentially, not enough nodes, etc.)
r

redsquare

11/10/2022, 4:07 PM
Yeah so we had socket errors on the agent - pending then means the agent accepted the job but has not reported back
and it will stay in that state given this scenario
c

Christopher Boyd

11/10/2022, 4:12 PM
Hrmmm, I’m not sure of what the expected behavior is in this case, I will need to check with the team and report back
do you have logs for the agent?
r

redsquare

11/10/2022, 4:26 PM
sadly not - we re-installed the agent over the course of the day - I will see what happens tonight
3 Views