When New Relic introduced NRDB—our purpose-built, multi-tenant analytics database—back in 2014, we fundamentally changed the way our customers interact with their monitoring data. Being able to ask questions of their data by slicing and dicing it in real time to diagnose software issues is extremely powerful. It can reveal trends, patterns, and root causes that can go unseen with single-dimension, aggregate metrics only.

Then, late last year at FutureStack16, we previewed a powerful new capability called NRQL alerts, which lets customers configure alerting conditions to get proactive, near real-time notifications leveraging the rich, dimensionally deep data used by New Relic Insights. (NRQL refers to the New Relic Query Language.)

Since then, hundreds of customers have been putting NRQL alerts through its paces. And now, as we get close to release, it is a good time to spend a few minutes exploring how this feature promises to change the way you think about operationalizing on your New Relic data.

The basics

With NRQL alerts, customers can write time series queries using NRQL and alert on the results. A “time series query” is basically any query that returns a number, so you can use aggregate functions like count(), average(), percentile(), sum(), min(), max(),stddev() and uniqueCount().

Here is an example query to alert on the median (50th percentile) of transactions responding with a 200 response code:

SELECT percentile(duration,50) FROM Transaction WHERE appName = 'Storefront' and httpResponseCode = '200'

Then, with the flexibility of NRQL, customers can easily hone in (by adding filters) on an even more precise signal for alerting. For instance, adding the following filters changes the data we’re looking at pretty significantly:

SELECT percentile(duration,50) FROM Transaction WHERE appName = 'Storefront' and httpResponseCode = '200' and externalCallCount > 0 and transactionType = 'Web'

Now the query returns only results for web transactions that include calls to external services.

True operational flexibility

When you install a New Relic agent, you immediately start collecting incredibly valuable performance data with powerful dimensionality like the attributes in the queries above (httpResponseCode, externalCallCount, transactionType). However, customers can easily take this even further by leveraging the custom attribute API to decorate their data with domain-specific dimensions of particular interest to their business, such as user types and account segments and identifiers.

For example, here’s an APM + Java Agent code sample that can be added to any transaction. It allows you to filter and facet your data using dimensions important to you:

NewRelic.addCustomParameter("userId", userId);     // 1234
NewRelic.addCustomParameter("userRole", userRole); // user, admin 
NewRelic.addCustomParameter("accountType", userId); // silver, gold

Systems or business KPIs

Using NRQL, you can filter directly to the data that best represent the system, service, or business KPIs that matter to you, and alert on it. And this capability isn’t just for dev and ops. New Relic customers in marketing positions are already monitoring the stream of revenue being processed by their shopping cart systems.

Here’s a real-world APM + Java Agent example that adds the item total to transaction events:

NewRelic.addCustomParameter("itemTotal", itemTotal); // 1234.31

By simply adding itemTotal as a custom attribute to the ‘WebTransaction/web/orders’ transaction of their Order Service code, you can use New Relic to operationalize on business performance. Pretty incredible.

NRQL itemTotal condition screenshot example

This screenshot shows what a NRQL itemTotal condition looks like in New Relic Alerts. [click to enlarge]

How it works: an example

What do you get when you combine the power of NRDB, NRQL, and New Relic Alerts?

Streaming query alerts!

Let’s take a look at how it works. Your first step is to craft a query that returns a number you’re interested in. Say you have a business-critical service processing transactions around the clock. You’ve deployed this app behind a load balancer with 10 Java virtual machines and want to know if, at any time, fewer than 10 JVMs are processing requests for a sustained amount of time.

First, let’s write a query to answer the initial question “How many JVM are serving transactions right now?” We know that New Relic APM’s Transaction event type has the host attribute on it. The query below counts the unique hosts creating transaction events each minute. In this case, I know that my “Storefront” app is served by four app instances, and it looks like the data agrees with me:

NRQL query screenshot example

[click to enlarge]


Now let’s port this over to a NRQL condition. I’ve got the NRQL in place and the data preview still looks good. Then I just need to set my threshold parameters to trigger when five consecutive minutes go by with fewer than four app instances processing transactions. Piece of cake, right?

NRQL condition screenshot example

[click to enlarge]

Now I’ll get notified whenever there is “Not enough Storefront instances!”

Virtually unlimited use cases

At this point you might ask, “Okay, so I get this is powerful for system or business ops, but I have a terrible imagination. Can you just give me giant list of things I can do with this that might inspire me?” Yep, you got it!

Inspired by the hundreds of New Relic customers (in early access) already using this feature, here are some useful examples:

Arbitrary response time percentiles: It’s fast and easy to use NRQL to query the raw event data in NRDB. Having the raw data lets you compute arbitrary percentiles as needed, whether it’s the 50th, 90th, or even the 93rd percentile.

As we’ve said in the past, “averages are great, but sometimes they don’t tell the whole story because in a typical web application outliers will drag the average up.” The sample NRQL queries below will give you the time it takes for 95% of the transactions to complete in the last minute.

APM app server transaction duration:

SELECT percentile(duration, 95) FROM Transaction WHERE appName = 'Storefront' and transactionType = 'Web'

Browser client side page view durations:

SELECT percentile(duration, 95) FROM PageView WHERE appName = 'Storefront'

Synthetics monitor check data: From multi-location failures to page load times, page size over time, and more, New Relic Synthetics puts a ton of data into NRDB, all ripe for querying. Here are some example queries:

Monitor “Storefront Homepage” 95th percentile check duration is taking too long

SELECT percentile(duration, 95) FROM SyntheticCheck WHERE monitorName = 'Storefront Homepage'

Monitor “Storefront Homepage” total page size points to unoptimized assets

SELECT average(totalResponseBodySize) FROM SyntheticCheck WHERE monitorName = 'Storefront Homepage'

Monitor “Storefront Checkout Page” has high percentage of sub-requests failing

SELECT percentage(count(*),WHERE responseStatus != 'OK') FROM SyntheticRequest WHERE monitorName = 'Storefront Checkout'

Monitor “Storefront Homepage” is failing to load Javascript files

SELECT count(*) FROM SyntheticRequest WHERE monitorName = 'Storefront Homepage' and responseStatus != 'OK' and path like '%.js'

Monitor “Storefront” has multiple locations failing checks*

SELECT uniqueCount(location) FROM SyntheticCheck WHERE monitorName = 'Storefront' and result = 'FAILED'

*To use the above query, customers must set two condition options in the Alerts UI—combining the query above with the “sum of query results” and a wider duration (15 minutes, for example).

Mobile interactions: Using New Relic Mobile to monitor the usage trends of your mobile app can tip you off about issues with your app builds as well as problems with your support backend services. Here’s a great example:

No one is logging into the mobile app for an extended amount of time

SELECT count(*) FROM Mobile WHERE name = 'AppName.MobileApp.QuickLogin'

Browser client-side API tracking: There is a lot of interesting data in New Relic Browser’s PageView, PageAction, and AjaxRequest event types that can tell you a lot about what is going on in your users’ mobile or desktop client. For example:

Percentage of Page Actions and Ajax Requests resulting in 500s is high

SELECT percentage(count(*), WHERE actionName is not null) from PageAction, AjaxRequest WHERE appId = 1234 and (actionName = 'AjaxError' and httpResponseCode >= 500 and httpResponseCode <= 599 and requestUrl like '%/api/commerce/order/%') or (actionName is null and requestUrl like '%/api/commerce/order/%')

Custom event types: Some of our most advanced customers make heavy use of the Insights Insert API. They are creating their own event types, events, and attributes, which is another way of saying the power of NRQL alerting is virtually limitless. Check out these fun and interesting use cases:

The example below collects the number of completed e-commerce orders. If it drops to 0 over the past 30 minutes during regular business hours, you can infer that users are unable to make an order, even if your system isn’t reporting any failures. You can slice-and-dice this further to detect problems that might affect only specific platforms (iOS vs. Android). Note that the “MobileMetrics” event type is a custom type created to store important e-commerce events from mobile apps.

SELECT uniqueCount(ecommerce_id) FROM MobileMetrics WHERE order_id IS NOT NULL

Here’s another example, this time using a custom event to save money on cloud storage:

SELECT filter(sum(sent)*0.09/(1024*1024*1024) , WHERE bucket = 'company-downloads') + filter(sum(sent)*0.14/(1024*1024*1024) , WHERE bucket != 'company-downloads') from S3Logs

This is a pretty extreme example, but it illustrates how open-ended this feature is. Imagine you built a custom poller that gets data from Amazon Simple Storage Service (S3) access logs and sends it to the New Relic database (NRDB) using the Insights Insert API. The query filters down to a specific S3 bucket we care about and calculates the current transfer cost that is occurring right now for that bucket. The idea is that by sending a webhook when pricing increases beyond a certain threshold, a system can be informed to store/retrieve data elsewhere to reduce cost. Interesting, right?

Just the beginning

We are very excited to get this new capability into our customers’ hands. We can’t wait to see all the cool new use cases our customers will come up with.

That said, we are just scratching the surface of the possibilities that open up when we connect NRDB to New Relic Alerts. For the imagination impaired, Nadya Duke Boone explains more in her post on Introducing New Relic’s Dynamic Baseline Alerts.

 

Nate Heinrich is a product manager at New Relic. He has a background in IT management, Web development, and operations. His hobbies include sports that include balls and nets, games of the video variety, and experimenting with machine learning APIs to one day predict something useful. View posts by .

Interested in writing for New Relic Blog? Send us a pitch!