Work Queues becoming unhealthy I have code running on a clo Prefect Community #ask-community

Work Queues becoming unhealthy I have code runnin...

05/01/2023, 7:20 PM

Work Queues becoming unhealthy I have code running on a cloud server. Agent on server is started using

nohup prefect agent start  --work-queue "<name>" > ~/tmp/prefect_agent.log &

It use to work fine, but recently I notices it becomes “unhealthy” and stops running flows. any idea why this might happen and how to prevent this ? thanks

✅ 1

Stéphan Taljaard

05/01/2023, 7:23 PM

My agents also went down with a HTTP 500 error. This is due to the incident reported over at prefect.status.io (see pic) It's stable again for me now, though. You can look at using something like systemd/supervisor to keep your agents running if you're on linux

05/01/2023, 7:37 PM

I use Linux Debian making sure I understand… I should try and use systemd/supervisor instead of the

nohup

Stéphan Taljaard

05/01/2023, 7:41 PM

I used to run my Prefect 1 agents using systemd like this:

Copy code

# This file goes in /etc/systemd/system/prefect_agent.service
# It is to start the long-running service for Prefect's LocalAgent

[Unit]
Description="Run Prefect Local Agent"
After=prefect_server.service

[Service]
User=prefect
WorkingDirectory=/home/prefect/prefect
EnvironmentFile=/etc/default/prefect_agent_keys.key
# Need to spell out poetry because ExecStart needs absolute paths
ExecStart=/home/prefect/.poetry/bin/poetry run \
    prefect agent start --work-queue default aap-data-transfer
Restart=on-failure
RestartSec=180

# Run this service anytime the system boots:
[Install]
WantedBy=multi-user.target

I now run my agents manually in tmux windows, but tbh I should move back to using something to restart the service when it fails

👍 1

05/01/2023, 7:44 PM

thanks

🙌 1

05/02/2023, 2:08 PM

Why does the Work pool becomes un-healthy several times a day?

05/03/2023, 1:15 AM

I have about 6 flows, some work every 15 minutes some less frequent. I get several warnings a day on late flows and un-healthy work queues. How can I reduce the alerts to significant alerts, how to set some threshold to when an alert should be generated?

05/03/2023, 1:19 AM

I see… in the automation you can choose, “enters” or “stays in” and set the time…. this solves the issue

2 Views

Open in Slack

Previous Next