Can we get a health + stats endpoint on an agent so we can m Prefect Community #ask-community

Join Slack

Can we get a /health + /stats endpoint on an agent...

# ask-community

redsquare

11/10/2022, 10:01 AM

Can we get a /health + /stats endpoint on an agent so we can monitor and cycle if needed

Christopher Boyd

11/10/2022, 3:04 PM

HI redsquare - is there a specific case you are seeing this being necessary?

Christopher Boyd

11/10/2022, 3:04 PM

Not saying that it’s not a bad idea, but I’m curious if there is another problem occurring that you feel this is missing

Christopher Boyd

11/10/2022, 3:30 PM

the agent natively is not running any sort of webserver, and has no listening connections as it is outbound only

Christopher Boyd

11/10/2022, 3:31 PM

however, you can add a liveness probe that tests if the prefect agent is still responsive, with something like this:

Copy code

livenessProbe:
            exec:
              command:
              - prefect
              - agent
              - '--help'
            initialDelaySeconds: 5
            periodSeconds: 5

redsquare

11/10/2022, 3:50 PM

hey @Christopher Boyd we had network blips on the agent overnight and 2 flows left in a pending state as a result

redsquare

11/10/2022, 4:00 PM

Do pending jobs ever recover if the agent couldnt kick the job off

redsquare

11/10/2022, 4:01 PM

do they eventually get set to failed

Christopher Boyd

11/10/2022, 4:04 PM

You mean pending kubernetes jobs?

Christopher Boyd

11/10/2022, 4:04 PM

or Pending prefect flow runs?

Christopher Boyd

11/10/2022, 4:05 PM

if it’s a kubernetes job that was started, taht wouldnt’ be a prefect issue at that point, it was handed off to execute to k8s and I would be looking at the logs there for why it’s stuck in pending (insufficient cpu/memory potentially, not enough nodes, etc.)

redsquare

11/10/2022, 4:07 PM

Yeah so we had socket errors on the agent - pending then means the agent accepted the job but has not reported back

redsquare

11/10/2022, 4:08 PM

and it will stay in that state given this scenario

Christopher Boyd

11/10/2022, 4:12 PM

Hrmmm, I’m not sure of what the expected behavior is in this case, I will need to check with the team and report back

Christopher Boyd

11/10/2022, 4:12 PM

do you have logs for the agent?

redsquare

11/10/2022, 4:26 PM

sadly not - we re-installed the agent over the course of the day - I will see what happens tonight

3 Views

Open in Slack

Previous Next