Hey Folks, how can I check the health of a Perfect Agent in 2.0? It is running inside a docker conta...

Prem Viswanathan

12/06/2022, 4:10 PM

Hey Folks, how can I check the health of a Perfect Agent in 2.0? It is running inside a docker container and I’m trying to figure out a reliable health check locally instead of pinging the Orion server.

Kalise Richmond

12/06/2022, 4:20 PM

Hi @Prem Viswanathan, you should be able to see the agent health check on the work queue in the UI.

🙌 1

Prem Viswanathan

12/06/2022, 4:21 PM

Is there a programmatic way to do it? Ideally from within the container where the agent is running.

Prem Viswanathan

12/06/2022, 4:57 PM

So I’m assuming this isn’t really an option? How does the orion server actually check if agent is healthy?

Zanie

12/06/2022, 5:05 PM

The Orion server gets pings from the agent

Zanie

12/06/2022, 5:06 PM

In v1, we have the agent host a little server that has a

/health

endpoint — we could do the same here. If the agent process is running it should be healthy though — if it’s running and unhealthy that’s a bug.

🙏 1

Prem Viswanathan

12/06/2022, 5:09 PM

okay, so with v2, does the

/health

endpoint option still exist? Can we enable it?

Zanie

12/06/2022, 5:15 PM

No we haven’t added one. I don’t see the point of a health endpoint if the process exits when unhealthy. What’s your use-case?

Prem Viswanathan

12/06/2022, 5:20 PM

We run a certain number of Agents as a service on ECS - which picks flow from a queue and executes them. I’m trying to figure out if I need an explicit local health check within the Agent Task to trigger the removal of that task container and trigger the scale-up of a replacement agent container.

Zanie

12/06/2022, 5:25 PM

Are the agents running flows locally or on external infrastructure?

✅ 1

Prem Viswanathan

12/06/2022, 6:20 PM

locally.

Zanie

12/06/2022, 6:23 PM

Ah, it does seem possible for the agent to be “unhealthy” then in that if it is running a bunch of work it may be resource starved and unable to query for more runs.

✅ 1

Zanie

12/06/2022, 6:24 PM

We’ve got some changes coming to this interface soon, I don’t think we can promise this quickly but it’s in the works. cc @Jeremiah

Jeremiah

12/06/2022, 6:25 PM

Yes, we’ll be supercharging agents in the very near term. I would think this enhancement will be straightforward on top of those changes

🙌 2

Prem Viswanathan

12/06/2022, 6:30 PM

Got it. Thanks for the input, folks. So sounds like another typical workflow is the agent acting like “an orchestrator” - taking work off the queue and running the flow on a different compute unit?

Zanie

12/06/2022, 6:48 PM

Yeah that’s far more common for production usage so you can allocate resources per flow run.

Prem Viswanathan

12/06/2022, 6:49 PM

Yeah, with ECS tasks, that part takes too long to trigger the run. so hence we’re trying the workers on standby approach

Zanie

12/06/2022, 6:50 PM

Are your runs adhoc or scheduled?

Prem Viswanathan

12/06/2022, 8:25 PM

adhoc

Zanie

12/06/2022, 9:17 PM

Ah that’s trickier. For scheduled runs, we can submit them early.

Zanie

12/06/2022, 9:18 PM

For adhoc runs, if latency is important, it sounds like you’ve got the correct solution.

Zanie

12/06/2022, 9:18 PM

If you open an enhancement request on GitHub, we can track a change to support this.

🙏 1

Prem Viswanathan

12/06/2022, 10:26 PM

Got it; appreciate the input. To confirm, my enhancement request would be a feature request to enable tracking agent health locally, right?

Zanie

12/06/2022, 10:33 PM

👍

Zanie

12/06/2022, 10:34 PM

I could see us returning information on consumed concurrency slots too, allowing you to know when to scale up.

🙏 1

23 Views

Open in Slack

Previous Next

Prefect Community

Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.