Ben Tasker
07/12/2023, 12:57 PM{
"match": {
"prefect.resource.id": "h1.*"
},
"match_related": {},
"after": [],
"expect": [
"h1.status.DOWN"
],
"for_each": [
"prefect.resource.id"
],
"posture": "Reactive",
"threshold": 1,
"within": 0
}
That fires just fine.
Turning it into a Proactive check though
{
"match": {
"prefect.resource.id": "h1.*"
},
"match_related": {},
"after": [
"h1.status.DOWN"
],
"expect": [
"h1.status.UP"
],
"for_each": [
"prefect.resource.id"
],
"posture": "Proactive",
"threshold": 0,
"within": 60
}
The automation never shows as triggered (I wondered if perhaps the doc was wrong and swapped after
and expect
, but no dice).
What am I missing?{
"id": "1c0905b6-aaec-4a79-bbee-3405b3fe07a6",
"account": "f188f746-cb64-473b-a006-9641d17770d2",
"event": "h1.status.DOWN",
"occurred": "2023-07-12T13:00:24.258Z",
"payload": {
"url": "<https://mastodon.bentasker.co.uk/>",
"reason": "I'm a teapot.... nofail",
"http_status": "418 (real: 200"
},
"received": "2023-07-12T13:00:24.486Z",
"related": [
{
"prefect.resource.id": "prefect.flow-run.3c1a01c9-56bb-4850-8b73-a57e7131baef",
"prefect.resource.name": "crazy-oryx",
"prefect.resource.role": "flow-run"
},
{
"prefect.resource.id": "prefect.task-run.0f848ab1-a9cf-447e-8e29-95f43f281494",
"prefect.resource.name": "do_h1_check-0",
"prefect.resource.role": "task-run"
},
{
"prefect.resource.id": "prefect.flow.ae43f61b-5a9a-48e3-8ab2-9e81e27b17c5",
"prefect.resource.name": "main",
"prefect.resource.role": "flow"
}
],
"resource": {
"prefect.resource.id": "h1.https-mastodon-bentasker-co-uk-"
},
"workspace": "12289eac-ebed-4929-b43a-cd6471c8a206"
}
Christopher Boyd
07/12/2023, 1:20 PMIf a thing HASNT happened, trigger an event
?Ben Tasker
07/12/2023, 1:26 PMChristopher Boyd
07/12/2023, 1:27 PMBen Tasker
07/12/2023, 1:28 PMwithin
value (meaning a subsequent reset would lead to it not showing at all)Or, if your work queue enters an unhealthy state and you want your trigger to execute an action if it doesn't recover within 30 minutes, you could paste in the following trigger configuration:
Will Raphaelson
07/12/2023, 3:55 PMBen Tasker
07/12/2023, 4:46 PM{
"match": {
"prefect.resource.id": "h1.*"
},
"match_related": {},
"after": [
"h1.status.DOWN"
],
"expect": [
"h1.status.UP"
],
"for_each": [
"prefect.resource.id"
],
"posture": "Proactive",
"threshold": 1,
"within": 60
}
Will Raphaelson
07/12/2023, 5:55 PMChris Guidry
07/12/2023, 8:44 PMafter
, expect
, and within
all interact.
I'll take your automation and break it down into English, and I hope that will help clarify things.
{
"after": [
"h1.status.DOWN"
],
"expect": [
"h1.status.UP"
],
"for_each": [
"prefect.resource.id"
],
"threshold": 1,
"within": 60
}
Okay, taking these fields, here's how you'd translate this into words:
After I see a "DOWN" event for a specific resource, expect at least 1 "UP" event for that same resource within 60 seconds (of that previous "DOWN" event). If I don't get the "UP", fire off the associated actions>0
, I believe it was saying "expect at least 0
'UP' events" and that was always true!1
, the automation was able to fire again correctly. Does that sound right?Ben Tasker
07/13/2023, 8:06 AMI think later after you changed the threshold toYep, dead on the mark. The only thing, is for some reason, it took around 30 minutes after changing, the automation was able to fire again correctly. Does that sound right?1
threshold
for it to start working (so when I'd originally tested it, it appeared not to).
It seems to consistently work now, if I toggle the status I get a notification in good time.
we should try to better explain in our documentation howOne thing I thought might be worth flagging. For,after
, andexpect
all interact.within
proactive
, the relationship between these makes semantic sense (after x expect y within z).
But, reactive
notifications instead use expect
rather than after
which is inconsistent with that semantic model (because what you're actually saying here is - create an event handler that expects/triggers on x)
This difference also means that, if an automation is being moved from a reactive
stance to a proactive
one, there's a reasonable probability the operator will forget to move their expect
value into after
.
It's probably too late to change the way it works, but you might want to quite loudly call it out in your docs - because the automation wasn't working, I was left wondering whether the doc perhaps had it backwards instead.Will Raphaelson
07/13/2023, 3:44 PMChris Guidry
07/13/2023, 3:47 PMThe only thing, is for some reason, it took around 30 minutes after changingYou know what, I think I can see that in your event feed too. That's concerning, I'll keep digging on that. Thanks for the feedback on the docs, we should probably dedicate a section that clarifies what the fields mean for each posture, that's very valid.for it to start working (so when I'd originally tested it, it appeared not to).threshold
expect
and after
do both have consistent meaning for Reactive
and Proactive
, ("after these events expect those events") but the use cases for them are definitely different between the two postures, great call