I've found some stuff about the Prefect 1 agent ha...
# ask-community
j
I've found some stuff about the Prefect 1 agent having a healthcheck, but I can't figure out if it still applies to the Prefect 2 agent. Has anyone got a healthcheck set up for their agent?
b
Hey Joseph, I'm not familiar with an equivalent health check for 2.0 agents. However, for workers, they support running a health check server.
the flag exposes a separate
/health-check
endpoint on the worker container, for something like k8s to send health check requests to the exposed endpoint
if you wouldn't mind sharing a bit about what you'd like to implement, that'd be helpful as well 🙌
j
I don't think we've upgraded to a version that has workers, we're on 2.8.3 at the moment I want exactly the thing you've described for the worker but right now we are using an agent (the docs say workers are a beta release and subject to change, I'd like to avoid reworking agents to workers then reworking workers to workers 2.0) At the moment we get occasional unrecoverable errors in the prefect agent when it gets a (maybe one, maybe many, haven't counted) 5xx from Prefect Cloud being unreachable, these seem to kill the running prefect process in some manner so that work stops getting run but flow runs are sat in a Pending state. If there was a health check then I could do similar to your k8s process via AWS ECS to restart the container. I'll probably just end up hooking up some restarter Lambda off the back of a CloudWatch Alarm based off of the CPU utilization I've seen in the idle failed and idle working states Mostly for anyone searching in the future: our agent is running as a service in AWS ECS using the minimum fargate allocation, from the metrics I have it seems to be that in the unrecoverable state it idles at ~0.25% CPU utilization, whereas idling in a working state is mostly >0.4%