I ve found some stuff about the Prefect 1 agent having a hea Prefect Community #ask-community

I've found some stuff about the Prefect 1 agent ha...

Joseph Thickpenny Ryan

11/21/2023, 3:21 PM

I've found some stuff about the Prefect 1 agent having a healthcheck, but I can't figure out if it still applies to the Prefect 2 agent. Has anyone got a healthcheck set up for their agent?

Bianca Hoch

11/21/2023, 6:50 PM

Hey Joseph, I'm not familiar with an equivalent health check for 2.0 agents. However, for workers, they support running a health check server.

Bianca Hoch

11/21/2023, 6:52 PM

the flag exposes a separate

/health-check

endpoint on the worker container, for something like k8s to send health check requests to the exposed endpoint

Bianca Hoch

11/21/2023, 6:54 PM

if you wouldn't mind sharing a bit about what you'd like to implement, that'd be helpful as well 🙌

Joseph Thickpenny Ryan

11/22/2023, 9:10 AM

I don't think we've upgraded to a version that has workers, we're on 2.8.3 at the moment I want exactly the thing you've described for the worker but right now we are using an agent (the docs say workers are a beta release and subject to change, I'd like to avoid reworking agents to workers then reworking workers to workers 2.0) At the moment we get occasional unrecoverable errors in the prefect agent when it gets a (maybe one, maybe many, haven't counted) 5xx from Prefect Cloud being unreachable, these seem to kill the running prefect process in some manner so that work stops getting run but flow runs are sat in a Pending state. If there was a health check then I could do similar to your k8s process via AWS ECS to restart the container. I'll probably just end up hooking up some restarter Lambda off the back of a CloudWatch Alarm based off of the CPU utilization I've seen in the idle failed and idle working states Mostly for anyone searching in the future: our agent is running as a service in AWS ECS using the minimum fargate allocation, from the metrics I have it seems to be that in the unrecoverable state it idles at ~0.25% CPU utilization, whereas idling in a working state is mostly >0.4%

4 Views

Open in Slack

Previous Next