This post is adapted from an entry in the New Relic Community Forums. Please feel free to join the discussion there!
A few months ago, New Relic announced the open beta rollout of a brand-new Alerts platform—and since then you might have noticed the Alerts beta tab in your New Relic UI. As the Alerts beta continues, I’ve realized that we aren’t all speaking the same language when we throw around terms like “notification” or “critical threshold” in support tickets and in New Relic Community Forum responses—so I wrote this handy Alerts Terminology Primer to help get everyone on the same page.
I will approach this from the point of view of someone who has just signed up for the Alerts beta, and introduce the terms in the order that a new user might first encounter them.
Let’s start with the term “Alert” since that is ostensibly what we’re talking about. Here, an Alert is not a message you get telling you that something has gone wrong. From this point on, “Alert” will be used to refer to the Alerts beta (v3) and the concept of alerting in general. It will all make sense by the end, I promise.
So you’ve just opted into Alerts. Congratulations! The first thing you need to get started is a Policy. So you click on Alert policies, come up with a meaningful name, and press Create policy.
You’re off to a great start! But what have you created? You can think of a Policy as a container for Conditions and Notification Channels (more on those later.) For now, just know that the Policy acts as a way to collect potentially related Conditions and Notification Channels. You don’t have to put them together in a single Policy, but it is a much cleaner way to keep track of them than creating a Policy for every Condition.
That brings us to the contents of the Policy container: Conditions and Notification Channels. I like to start with Notification Channels because without them you are unlikely to know that anything is happening. The Notification Channels tab in the gray navigation bar will take you to the page where you can find the Create a notification channel button.
The simplest channel, in my opinion, is email because it doesn’t involve any additional setup. So click Create a notification channel, select E-mail from the drop-down menu, add your email address, and click Create channel.
Now we’re ready to create an Alert Condition! Select your new Policy, click on the tab that says 0 Alert conditions, then press Create a condition.
This button reveals a lot of options. Fortunately, the first row of choices purposely mimics the tabs in the top navigation bar that correspond to New Relic’s different products. Hopefully, you have an app reporting to APM, or something else for which you would like to create an Alerts Policy. (If not, you can still follow along but you may not get as much out of this discussion.) I am going to choose APM as my product and Application metric as my type of condition.
Here again my choice is guided by simplicity—my chief goal here is explain key terms through examples, not to cover every aspect of Alerts. Next, select your targets. This is the app(s) we are going to be looking at when this Alert Condition is evaluated.
Next, define thresholds (hit that button!).
Here’s the next new term to introduce: Critical Thresholds. The easiest way to think of Critical Thresholds is to imagine a line in the sand. When this line is crossed, either once or for a set period of time (more on that in a moment), things really kick into action. In the UI, a Critical Threshold is the small red octagon with an X inside it, found beside an empty text box. Let’s fill in some boxes and then I’ll talk a little more about Critical Thresholds and what happens when they are violated.
I like setting up test Alert policies with just one Condition, and I like to set Condition to Throughput, as it is a very straightforward metric and easy to generate data that either violates or does not violate a Threshold. So, for example, I would select the following: Throughput (Web) has a call count above 200 calls for at least 5 mins. What this translates to is “When my app has at least 200 requests per minute for at least 5 minutes, I would like to consider that a violation of a Critical Threshold.”
In order to cross my line in the sand, I am going to need to step over it and stay there for a little while. Because I chose “for at least 5 mins,” I must have 5 consecutive minutes of data reporting in violation of the Critical Threshold to trigger a notification. This is a really crucial point! If you chose a metric that might sometimes have a 0 value, in those cases it won’t be consecutive and Alerts will not consider your Critical Threshold to have been violated!
While I was writing all this, I had my app refresh lots of elements every 20 seconds in order to generate some data and violate the Critical Threshold of the Alerts Policy Condition. If I were to navigate to the Alerts beta page by clicking Alerts beta in the navigation bar above, I would see that I have an open Incident (which is also our final new vocabulary word for the day).
Incidents are like another container. When a Critical Threshold is violated, an Incident gets created and events related to it are put inside the container. These events can be violations of Critical Thresholds, notifications sent, opening and closing of the Incident, and acknowledgement of Incidents (among other things). Note that an Incident will be created only when a Critical Threshold is violated, but other events will still show up in the Incident if they are related.
Hopefully, the way we use the terms “Alert,” “Policy,” “Critical Thresholds,” “Conditions,” “Notification Channels,” and “Incidents” now makes more sense to you, and you have a better understanding of how to navigate through the New Relic Alerts interface.
Be sure to join the discussion about the Alerts beta in the New Relic Community Forums.