Matthieu Lhonneux

10/19/2022, 8:33 AM
Hi, I migrated from prefect v1 to prefect orion (i love the prefect orion cli šŸ˜„ thanks for that !). I have a lot of questions to ask, but I'll start with this : ā€¢ Agents are not supervised? (with heartbeat like in prefectv1) ā€¢ When an agent is lost (network issue, oomkill .. and more) the middle of a run, the flow is still in "running" state, is this normal behavior? how to detect this problem? Thanks for all, Matt
šŸ™Œ 3
āœ… 1

Anna Geller

10/19/2022, 11:28 AM
#1 We are working on adding a feature that will allow tracking the health of agents by looking at when they last polled the Prefect API for work from a work queue - you can follow the #announcements and release notes to stay up to date #2 Good question. It depends on the infrastructure block. When you use KubernetesJob, DockerContainer or one of serverless containerized infra blocks, then flow run container or pod can run till completion even if agent that spun that up is down. But when you are using a Process block on a local server, then this flow run runs directly within the agent process and when your agent is down, the flow run cannot complete and you would need to delete this run e.g. from the UI if you are confused by the Running state. Usually such run would be marked as Crashed but sometimes Crashes may be hard to detect so feel free to open a GitHub issue and describe it in more detail if you see a different behavior than what I described and/or if this is still confusing
Generally speaking, a good mental model is that your agent is a lightweight process that should run 24/7 and should generally never be down (there are ways to accomplish that via automation e.g. Kubernetes deployment, ECS service etc)
:upvote: 1
Btw I love the CLI too šŸ™Œ