I just went in to cloud for the first time in a da...
# prefect-cloud
c
I just went in to cloud for the first time in a day or two and it looks like my notification rules were migrated. They didn’t pick up a failed run from this AM. Additionally, the notification method (slack) is now sealed up in an “anonymous block” that I can’t see or edit. Do I need to recreate the automations?
1
w
Hi Chris, thanks for flagging. The old style of notification was migrated to automations about a month ago, and yes you are correct that we chose to use the old anonymous blocks we were using under the hood so as to (ideally) not disrupt service. If youd like to change the slack block, yes recreating a new automation will give you that flexibility moving forward. This shouldn’t have caused any missed alerting though - if you have reproduction steps would you file an issue in our github repository?
c
OK. I’ll do so. I’m also seeing a 15 minute delay in notifications coming through which has made evaluating this a litle confusing.
Actually I’m not really sure what I can put in a bug report. All I know is that the notification did not fire this morning on job failure like it always has and that when I retried the job after modifying the automation to fire on a completed state, it did send the notification - only it was 15 minutes after the run finished instead of close to immediate. All the runs I’ve done since are also working except that there’s a 15 minute delay.
w
Got it, thanks and sorry for the snag there, let me have someone on the team look into it.
c
Thanks for that report, Chris. I'm very sorry about that, during the timeframe you were experiencing that Friday afternoon, I had released a change to improve the reliability of the automation triggering system, but it ended up creating a bottleneck that delayed message processing. It took until around 5:30-6pm ET to get it back to a steady state. Have you observed any problems since then?
c
Thanks for checking up. I believe everything has been going through well since then.
c
That's good to know! For more context, we've been putting in some protections about event ordering (like making sure that
Running
events are processed before their subsequent
Completed
events) and it's proving a little tricky with the volume of events we're seeing. Thanks for your patience and let us know if you see any more trouble.