https://prefect.io logo
p

Prem Viswanathan

12/06/2022, 4:10 PM
Hey Folks, how can I check the health of a Perfect Agent in 2.0? It is running inside a docker container and I’m trying to figure out a reliable health check locally instead of pinging the Orion server.
k

Kalise Richmond

12/06/2022, 4:20 PM
Hi @Prem Viswanathan, you should be able to see the agent health check on the work queue in the UI.
🙌 1
p

Prem Viswanathan

12/06/2022, 4:21 PM
Is there a programmatic way to do it? Ideally from within the container where the agent is running.
So I’m assuming this isn’t really an option? How does the orion server actually check if agent is healthy?
z

Zanie

12/06/2022, 5:05 PM
The Orion server gets pings from the agent
In v1, we have the agent host a little server that has a
/health
endpoint — we could do the same here. If the agent process is running it should be healthy though — if it’s running and unhealthy that’s a bug.
🙏 1
p

Prem Viswanathan

12/06/2022, 5:09 PM
okay, so with v2, does the
/health
endpoint option still exist? Can we enable it?
z

Zanie

12/06/2022, 5:15 PM
No we haven’t added one. I don’t see the point of a health endpoint if the process exits when unhealthy. What’s your use-case?
p

Prem Viswanathan

12/06/2022, 5:20 PM
We run a certain number of Agents as a service on ECS - which picks flow from a queue and executes them. I’m trying to figure out if I need an explicit local health check within the Agent Task to trigger the removal of that task container and trigger the scale-up of a replacement agent container.
z

Zanie

12/06/2022, 5:25 PM
Are the agents running flows locally or on external infrastructure?
1
p

Prem Viswanathan

12/06/2022, 6:20 PM
locally.
z

Zanie

12/06/2022, 6:23 PM
Ah, it does seem possible for the agent to be “unhealthy” then in that if it is running a bunch of work it may be resource starved and unable to query for more runs.
1
We’ve got some changes coming to this interface soon, I don’t think we can promise this quickly but it’s in the works. cc @Jeremiah
j

Jeremiah

12/06/2022, 6:25 PM
Yes, we’ll be supercharging agents in the very near term. I would think this enhancement will be straightforward on top of those changes
🙌 2
p

Prem Viswanathan

12/06/2022, 6:30 PM
Got it. Thanks for the input, folks. So sounds like another typical workflow is the agent acting like “an orchestrator” - taking work off the queue and running the flow on a different compute unit?
z

Zanie

12/06/2022, 6:48 PM
Yeah that’s far more common for production usage so you can allocate resources per flow run.
p

Prem Viswanathan

12/06/2022, 6:49 PM
Yeah, with ECS tasks, that part takes too long to trigger the run. so hence we’re trying the workers on standby approach
z

Zanie

12/06/2022, 6:50 PM
Are your runs adhoc or scheduled?
p

Prem Viswanathan

12/06/2022, 8:25 PM
adhoc
z

Zanie

12/06/2022, 9:17 PM
Ah that’s trickier. For scheduled runs, we can submit them early.
For adhoc runs, if latency is important, it sounds like you’ve got the correct solution.
If you open an enhancement request on GitHub, we can track a change to support this.
🙏 1
p

Prem Viswanathan

12/06/2022, 10:26 PM
Got it; appreciate the input. To confirm, my enhancement request would be a feature request to enable tracking agent health locally, right?
z

Zanie

12/06/2022, 10:33 PM
👍
I could see us returning information on consumed concurrency slots too, allowing you to know when to scale up.
🙏 1
6 Views