https://prefect.io logo
Title
j

jpuris

02/16/2023, 9:06 AM
Heya! Last night we received quite many slack notifications of flow being healthy from single Automation over span of roughly 2 hours. All but 2 were with valid payload 😕 The amount of notifications are to be attributed to following config • Trigger Type: Work queue health • _*Work Queues*_: <single production work queue> • _*Work Queue*_: Stays in > Healthy • _*For*_: 10 Seconds Our Slack notification template looks as follows
Name: {{ work_queue.name }}
Last polled: {{ work_queue.last_polled }}
Late run count: {{ work_queue.late_runs_count }}
URL: {{ work_queue|ui_url }}
But.. why are all but 2 notifications with no values? (see screenshot below). The two valid notifications are 1. The first one at 21:53 CEST 2. At 22:54 CEST (visible in screenshot) edit: For the record.. We do not intend to run such trigger condition (it was a mis-configuration). We do not really care being notified every 10 seconds about a queue being healthy 😅 Hence the subject of the notifications in screenshot is mis-leading
1
w

Will Raphaelson

02/16/2023, 3:03 PM
thanks for the flag @jpuris - I believe this may have been a one time blip as a result of a migration we applied on the back end, sorry about that! I think if you fix any misconfiguration on your side things should work as expected, but please do let me know if this crops up again and we can get to the bottom of it.
j

jpuris

02/16/2023, 3:05 PM
Hi, @Will Raphaelson! Are you saying the notifications being empty is because of the migration? 🤔 The misconfiguration was creating the many notifications we had received, but the issue is that almost all of them were with no values.
I can try to reproduce this, if you'd like 🤷
w

Will Raphaelson

02/16/2023, 3:12 PM
The frequency was for sure because of the migration, and it may have something to do with the empty parameter values - basically the trigger needs a valid triggering event to materialize those values, and i think they were just missing when these fired. I will try to reproduce too in a few hours and make sure these values are populating, but the more the merrier if you want to file a repro example at it would be helpful and appreciated. https://github.com/PrefectHQ/prefect
:thank-you: 1
j

jpuris

02/16/2023, 3:15 PM
Ah I see. So the trigger spec
Trigger Type: Work queue health
• _*Work Queues*_: <single production work queue>
• _*Work Queue*_: Stays in > Healthy
• _*For*_: 10 Seconds
Should not have generated gazillion events? If so, then it makes sense! The two "valid" events we received were exactly 1 hour apart, so I suppose this condition would generate hourly alerts of queue "being healthy" for more than 10 seconds 🤷
w

Will Raphaelson

02/16/2023, 5:04 PM
so how the stays in construction works under the hood is that it watchesfor a healthy events, and then watches for the absence of an unhealthy event within ten seconds, which I believe your conditions triggered. On the parameter values being blank, all of mine are populating except for the “count late runs” field, which i think is a legit bug and i filed it internally to clean up, so thanks for surfacing that.
:thank-you: 1
I think everything is behaving as expected right now, let me know if you encounter any other issues?
s

Stéphan Taljaard

02/16/2023, 7:06 PM
I also had queue health check misfires this week, where I got an alert stating my queue is unhealthy. Then I click the link and find that it is not unhealthy. My suspicion is that time check (unhealthy
for 9 minutes
) was ignored, and as soon my batch of flows were due even if a flow is only seconds late, the alert fires. Should I message in this thread if it happens again?
w

Will Raphaelson

02/16/2023, 7:06 PM
yes please, thanks for that. im also monitoring on our side.
👍 1
s

Stéphan Taljaard

02/24/2023, 8:13 AM
Here's a misfire that just happened
Unhealthy work queue
Name: default
Last polled: 2023-02-24T08🔟25.029779+00:00
Late run count:
URL: https://app.prefect.cloud/account/2a0672be-6e64-4081-bdfd-a4dfded5a802/workspace/b70a86f1-659f-408f-add7-9665c6bfa327/work-queues/work-queue/2aa024db-13b8-4203-a89b-2db6487959ad
Prefect Notifications | Today at 10:10