One thing we love about our new Dynamic Baseline Alerts, now in limited release, is how fast and easy it is set up an alert with confidence. From the beginning, we knew that we wanted to give folks a preview so they could see exactly how the alert would respond with their actual data. But we quickly realized that there is so much variation in customer data that we needed to do something pretty smart to make it work well for everyone.
That’s what drove us to create the asymmetrical sensitivity slider in the Baseline Alerts feature of New Relic APM. The slider is designed to help you graphically select a threshold controlling when you’ll receive an alert, based on how far a given metric deviates from the baseline.
The width of the band changes as you move the slider. As you slide it to the right (towards alerting on fewer violations), the band expands so that the feature will alert you only on increasingly significant anomalies. Move it to the left (towards alerting on more violations), and you’ll get alerts on less dramatic deviations from the baseline.
How we improved it
Our first iteration of the sensitivity slider in Dynamic Baseline Alerts was an equal interval scale. As you moved the slider, the alert band expanded or contracted in a simple linear relationship with the slider position. But when we tested that early version internally, we discovered that it worked well for some metric timeseries, but not so well for others. That’s because different metrics show different amounts of variation. For example, disk usage is typically fairly predictable, whereas CPU utilization is often less predictable as different processes start and stop. For metrics with a large normal range, a simple linear relationship made it difficult to set the alert band to catch anomalies without generating alerts on less significant issues. We decided we needed to go beyond a basic equal-interval slider to make it work well for the wide variety of data we, and our customers, encounter.
Rather than have the slider control the band linearly, we designed the left end of the slider to provide finer adjustments, and the right end of the slider to deliver bigger adjustments. For example, on a response time Baseline Alert for one application, a small change on the left end of the slider might widen the band by 1 millisecond. The same change on the right end of the slider might widen the band by 100 milliseconds.
How does it all work? It’s a two-step process.
Custom slider scales
First, we set the scale of the slider to depend on the predictability of the metric being observed. In other words, the base unit for the slider is custom to that particular data set. Think of that base unit as the width of the smallest band you can select.
Why did we do it that way? Customizing the base unit based on predictability helps make the entire slider range more useful. Before we made that adjustment, we often found ourselves trying to adjust the band for a given metric by working with just a small part of the slider range. That’s because metrics that are less predictable need a broader base width so that you can configure an alert that catches the anomalies, not the normal variation. But when you’re dealing with a more predictable metric, you want a narrower band so you can adjust a tighter threshold. With custom base units, the range of the slider automatically adjusts to make it easier to configure the alert threshold.
An exponential improvement
Then, we coded the slider to scale not linearly, but based on the natural exponential function, y=ex, as shown in the graph below:
This curve give us our desired effect: fine-grain control at one end, and the ability to quickly input more dramatic changes at the other.
This design let us set Dynamic Baseline Alerts to work well with a wide variety of real-world metrics.
We knew we’d have to use math to create the baseline predictions that support Dynamic Baseline Alerts in New Relic APM. It was a bonus when the data nerds at New Relic got to dig into some cool math to make the user interface just as awesome as the baselines themselves.
Dynamic Baseline Alerts is currently in limited release and is scheduled to be widely available later this year.