Is there a healthcheck route we can use for an `ag...
# prefect-community
l
Is there a healthcheck route we can use for an
agent
/
executor
docker service? I saw a feb 5 PR around some storage healthchecks, but not seeing docs for instrumented monitoring here. Ideally something curl-able, like,
Copy code
healthcheck:
      test: ["CMD-SHELL", "curl -sSf <http://prefect/health> | jq .code | grep 200 || exit 1"]
đź‘Ť 1
c
Hey Leo! Which agent are you running and where? The k8s agent has a health check route but none of the other ones do at the moment. Is the goal to trigger a restart in the event the health check fails?
l
we're trying to figure out secure (federated) & observable & auto-restarting executors. basic soln we came to is docker-compose executors put onto every server donated to us. the toy setup is used a prefect container base, but we'll be switching to a nvidia rapids base image and pip/conda installing prefect on top (we're trying to reuse gpu/pydata/etc. deps across all envs, so won't use prefect base). we're using standard docker monitoring etc. stacks, so hoping for some in-inprocess rest endpoint we can just poll on.
looking at the
Dockerfile
, I see:
Copy code
RUN prefect backend server
(I also see
prefect agent start
, but the intent here is operating the executor, not the agent)
(my understanding is
executor
= task runner =
backend server
, while
agent
is a client interface for submitting a job... still learning the lingo!)
c
when you’re talking about “executor health” what are you referring to? The only executor type that is long-lasting is a Dask executor but you don’t appear to be doing anything with Dask
l
We have a growing number of compute servers. Each one runs a docker container with
prefect
installed, and afaict,
prefect
will poll the central server for new tasks. We can make sure the docker container itself stays running, but not sure how to tell if
prefect
gets wedged .
c
It sounds like you’re running multiple Prefect agents on various servers; what type of agents are you running? Local Agents?
l
Not sure. The prototype has
RUN prefect backend server
. Ultimately these will be
<http://rapids.ai|rapids.ai>
tasks + neural network stuff. We're not using prefect for its dask capabilities, just task dispatch & reporting.
Happy to follow recommendations!
c
prefect backend server
doesn’t really do anything, it just updates your local user configuration to point to
localhost:4200
for the API instead of Cloud
l
Ah I see our entrypoint.sh also has:
Copy code
# Keep the container running
prefect agent start
c
got it, yea that’s a Local Agent then 👍
l
But we can switch to whatever other agent, we're trying to figure out the right way to do it
(we already have the central UI server running elsewhere behind a bastion, and using VPC rules to allow direct central UI <> executor flows)
c
yea, local agent is perfectly fine — the local agent will submit flows to run in subprocesses. Generally we recommend using
supervisord
to manage the parent process running the agent
So the dockerfile should install supervisor, run the parent agent via that. Except does the parent agent have a healthcheck in case it gets wedged?
c
yup exactly
unfortunately not natively; supervisord might though
l
Yeah supervisord has a healthcheck framework, but it still comes down to the prefect agent process having some way of doing a "oh hi, yep just checked, I am indeed OK for what I consider OK to mean"
c
we could definitely look into that as an enhancement though!
yea we could definitely add that, care to open an issue on GitHub?
l
Yep!
c
awesome thank you Leo!
l
local agent
, right?
c
yup yup
l
https://github.com/PrefectHQ/prefect/issues/2313 There are fancier forms of this, so just wrote the bog simple & standard one :)
đź’Ż 1