Work Queues becoming unhealthy I have code runnin...
# prefect-community
y
Work Queues becoming unhealthy I have code running on a cloud server. Agent on server is started using
nohup prefect agent start  --work-queue "<name>" > ~/tmp/prefect_agent.log &
It use to work fine, but recently I notices it becomes “unhealthy” and stops running flows. any idea why this might happen and how to prevent this ? thanks
1
s
My agents also went down with a HTTP 500 error. This is due to the incident reported over at prefect.status.io (see pic) It's stable again for me now, though. You can look at using something like systemd/supervisor to keep your agents running if you're on linux
y
I use Linux Debian making sure I understand… I should try and use systemd/supervisor instead of the
nohup
?
s
I used to run my Prefect 1 agents using systemd like this:
Copy code
# This file goes in /etc/systemd/system/prefect_agent.service
# It is to start the long-running service for Prefect's LocalAgent

[Unit]
Description="Run Prefect Local Agent"
After=prefect_server.service

[Service]
User=prefect
WorkingDirectory=/home/prefect/prefect
EnvironmentFile=/etc/default/prefect_agent_keys.key
# Need to spell out poetry because ExecStart needs absolute paths
ExecStart=/home/prefect/.poetry/bin/poetry run \
    prefect agent start --work-queue default aap-data-transfer
Restart=on-failure
RestartSec=180

# Run this service anytime the system boots:
[Install]
WantedBy=multi-user.target
I now run my agents manually in tmux windows, but tbh I should move back to using something to restart the service when it fails
👍 1
y
thanks
🙌 1
Why does the Work pool becomes un-healthy several times a day?
I have about 6 flows, some work every 15 minutes some less frequent. I get several warnings a day on late flows and un-healthy work queues. How can I reduce the alerts to significant alerts, how to set some threshold to when an alert should be generated?
I see… in the automation you can choose, “enters” or “stays in” and set the time…. this solves the issue