https://prefect.io logo
b

Ben Tasker

07/12/2023, 12:57 PM
Hey, Are proactive automations functional in Prefect Cloud? I can't seem to get one to trigger at all and am increasingly sure it isn't me. I'm playing around with a simple HTTP status checker, which sends an event in to say whether the target is UP or DOWN. At the moment I've hardcoded it to say down (https://github.com/bentasker/python_status_checker/blob/main/status_check.py#L148) If I add a reactive automation:
Copy code
{
  "match": {
    "prefect.resource.id": "h1.*"
  },
  "match_related": {},
  "after": [],
  "expect": [
    "h1.status.DOWN"
  ],
  "for_each": [
    "prefect.resource.id"
  ],
  "posture": "Reactive",
  "threshold": 1,
  "within": 0
}
That fires just fine. Turning it into a Proactive check though
Copy code
{
  "match": {
    "prefect.resource.id": "h1.*"
  },
  "match_related": {},
  "after": [
    "h1.status.DOWN"
  ],
  "expect": [
    "h1.status.UP"
  ],
  "for_each": [
    "prefect.resource.id"
  ],
  "posture": "Proactive",
  "threshold": 0,
  "within": 60
}
The automation never shows as triggered (I wondered if perhaps the doc was wrong and swapped
after
and
expect
, but no dice). What am I missing?
The raw event looks like this:
Copy code
{
  "id": "1c0905b6-aaec-4a79-bbee-3405b3fe07a6",
  "account": "f188f746-cb64-473b-a006-9641d17770d2",
  "event": "h1.status.DOWN",
  "occurred": "2023-07-12T13:00:24.258Z",
  "payload": {
    "url": "<https://mastodon.bentasker.co.uk/>",
    "reason": "I'm a teapot.... nofail",
    "http_status": "418 (real: 200"
  },
  "received": "2023-07-12T13:00:24.486Z",
  "related": [
    {
      "prefect.resource.id": "prefect.flow-run.3c1a01c9-56bb-4850-8b73-a57e7131baef",
      "prefect.resource.name": "crazy-oryx",
      "prefect.resource.role": "flow-run"
    },
    {
      "prefect.resource.id": "prefect.task-run.0f848ab1-a9cf-447e-8e29-95f43f281494",
      "prefect.resource.name": "do_h1_check-0",
      "prefect.resource.role": "task-run"
    },
    {
      "prefect.resource.id": "prefect.flow.ae43f61b-5a9a-48e3-8ab2-9e81e27b17c5",
      "prefect.resource.name": "main",
      "prefect.resource.role": "flow"
    }
  ],
  "resource": {
    "prefect.resource.id": "h1.https-mastodon-bentasker-co-uk-"
  },
  "workspace": "12289eac-ebed-4929-b43a-cd6471c8a206"
}
c

Christopher Boyd

07/12/2023, 1:20 PM
I don’t know that I’ve particularly tested proactive automations myself - I don’t have a good answer here at the moment without some testing, but generally you just want the effect:
If a thing HASNT happened, trigger an event
?
b

Ben Tasker

07/12/2023, 1:26 PM
Not exactly, it's more "if $thing happens and $other_thing doesn't happen within n minutes, trigger an event". So as an easy example - if my http probe has reported a service down, and it doesn't then report as up within 5 minutes, send a notification email
c

Christopher Boyd

07/12/2023, 1:27 PM
Oooo, I think I’ve seen this in the past, I don’t remember the specifics, but I remember it was hard to trigger on both
like one reset the other or something
it’s been awhile
b

Ben Tasker

07/12/2023, 1:28 PM
I had been wondering whether it might be something like that. The automation never shows as triggered at all in the UI, but it's not clear whether it'd show as triggered when a matching event comes in, or after the
within
value (meaning a subsequent reset would lead to it not showing at all)
FWIW, the docs give a similar example
Or, if your work queue enters an unhealthy state and you want your trigger to execute an action if it doesn't recover within 30 minutes, you could paste in the following trigger configuration:
w

Will Raphaelson

07/12/2023, 3:55 PM
ahh so i think you want that threshold to be 1. so if it doesn’t get 1 event in the within period, it will fire. currently its saying if it doesnt get 0. but it IS getting zero, so it doesn’t trigger.
and the git blame reveals that it was yours truly that put that example with 0 in the docs. sorry about that! fixing now.
b

Ben Tasker

07/12/2023, 4:46 PM
awesome, thanks! I'll give that a try now
I did wonder about that, but figured it must mean it'd never reach 1 😄
Hmmm, still no dice:
Copy code
{
  "match": {
    "prefect.resource.id": "h1.*"
  },
  "match_related": {},
  "after": [
    "h1.status.DOWN"
  ],
  "expect": [
    "h1.status.UP"
  ],
  "for_each": [
    "prefect.resource.id"
  ],
  "posture": "Proactive",
  "threshold": 1,
  "within": 60
}
There's nothing touching that resource after that in the event feed
I don't know if something changed your end, but it suddenly started working - I got notifications in my inbox about 15 mins ago
w

Will Raphaelson

07/12/2023, 5:55 PM
Yeah that’s odd. If the within interval was larger maybe I’d expect it to go through a full cycle to reset but it’s only a minute. Well, glad it’s working.
c

Chris Guidry

07/12/2023, 8:44 PM
Hi Ben, thanks for reaching out! I'm an engineer on Prefect Cloud and I wanted to take a deeper dive on this to make sure things are working as we expect. I'm reviewing the log of events and believe the system is working as designed, and I think we should try to better explain in our documentation how
after
,
expect
, and
within
all interact. I'll take your automation and break it down into English, and I hope that will help clarify things.
Copy code
{
  "after": [
    "h1.status.DOWN"
  ],
  "expect": [
    "h1.status.UP"
  ],
  "for_each": [
    "prefect.resource.id"
  ],
  "threshold": 1,
  "within": 60
}
Okay, taking these fields, here's how you'd translate this into words: After I see a "DOWN" event for a specific resource, expect at least 1 "UP" event for that same resource within 60 seconds (of that previous "DOWN" event). If I don't get the "UP", fire off the associated actions>
If I'm interpreting your event feed correctly, while you had the threshold set to
0
, I believe it was saying "expect at least
0
'UP' events" and that was always true!
I think later after you changed the threshold to
1
, the automation was able to fire again correctly. Does that sound right?
b

Ben Tasker

07/13/2023, 8:06 AM
Hey, sorry - timezones 🙂 Thanks!
I think later after you changed the threshold to
1
, the automation was able to fire again correctly. Does that sound right?
Yep, dead on the mark. The only thing, is for some reason, it took around 30 minutes after changing
threshold
for it to start working (so when I'd originally tested it, it appeared not to). It seems to consistently work now, if I toggle the status I get a notification in good time.
we should try to better explain in our documentation how
after
,
expect
, and
within
all interact.
One thing I thought might be worth flagging. For
proactive
, the relationship between these makes semantic sense (after x expect y within z). But,
reactive
notifications instead use
expect
rather than
after
which is inconsistent with that semantic model (because what you're actually saying here is - create an event handler that expects/triggers on x) This difference also means that, if an automation is being moved from a
reactive
stance to a
proactive
one, there's a reasonable probability the operator will forget to move their
expect
value into
after
. It's probably too late to change the way it works, but you might want to quite loudly call it out in your docs - because the automation wasn't working, I was left wondering whether the doc perhaps had it backwards instead.
w

Will Raphaelson

07/13/2023, 3:44 PM
Thats good feedback, thank you Ben. I think we can easily make the docs clearer here, but we are also open to changing the interface itself (in a backwards compatible way). I can noodle on this a bit and send you and issues I create.
🙏 1
c

Chris Guidry

07/13/2023, 3:47 PM
The only thing, is for some reason, it took around 30 minutes after changing
threshold
for it to start working (so when I'd originally tested it, it appeared not to).
You know what, I think I can see that in your event feed too. That's concerning, I'll keep digging on that. Thanks for the feedback on the docs, we should probably dedicate a section that clarifies what the fields mean for each posture, that's very valid.
expect
and
after
do both have consistent meaning for
Reactive
and
Proactive
, ("after these events expect those events") but the use cases for them are definitely different between the two postures, great call
🙏 1
2 Views