Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.

Prefect Community

does anyone's work queue periodically go unhealthy? Happens to be every couple days, then I have to restart the agent. Automated restarts don't seem to be working, or I cannot set it up right, but it'd be nice for the service to not crash in the first place. Is there a recommended path to diagnose the root of the unhealthy agent states?

Hey Albert, where/how are you hosting the agent? Definitely seems like daemonizing it is going to help here. What have you tried so far?

I added an onfailure item to the systemd startup script, and tested it by killing the service, seemed to respawn ... but in an actual scenario the process doesn't respawn after its died

so, I don't know if the process actually died, maybe it's in some corrupted state

that's why I want to look deeper into the issue instead of putting on a bandaid

image.png

Hello everyone,
Something similar happened to us, and was that the API Key expired:
Maybe you should check it out too.
I hope to help