A Framework for Implementing Software Analytics

Yesterday, in his opening keynote for FutureStack15 , New Relic founder and CEO Lew Cirne described our vision for instilling software analytics across our product line. Today I’d like to share some thoughts and examples drawn from how our most advanced customers use software analytics and New Relic products to optimize their software-powered businesses.

These leading companies implement Software Analytics by monitoring at three levels:

  1. Performance of applications and their infrastructure, which underpins…
  2. Customer experience, which influences…
  3. Business outcomes, which is what we’re ultimately working to optimize.

They use Software Analytics for proactive improvement of their apps, as well as for reactive responses to app problems. But how does this work in practice?

Here’s an example: Suppose a cross-functional software team—developers, app owners, and operations folks—learn of a performance problem in an app. Whether the notification came from an alert, support, social media, or some other source, the team will first ask a series of questions spanning all three disciplines in order to establish context and understand how best to dive into root-cause analysis.

The Spiral Method

To assess impact on business outcomes, they’ll ask which companies and/or individuals have seen this issue, and the related impact on revenue and conversion funnels.

To determine impact on customer experience, they’ll determine just how slow, or error prone, the app is for end users. Then they’ll look for variance by geography, device, network/carrier, and front-end asset. They’ll also look at response time breakdowns by frontend, network, and backend.

To explore factors around app performance, they’ll look for slow-performing or overloaded backend applications, API endpoints, microservices, databases, hosts, containers, and other components of their application infrastructure.

We call this process “spiraling in,” and it lets you get at any number of critical questions surrounding your software, including:

spiral method graph

The framework is powerful because it helps software teams focus on the right priorities. For example, troubleshooting an app may not be that important if the app isn’t impacting much revenue or affecting your biggest customers.

Spiraling requirements

Establishing context around how to improve your applications requires several key capabilities:

First, you need a single unified set of data that spans all three disciplines (business, customer, and app performance). If you use a point solution that measures only conversion rates, for example, you might learn that conversion is dropping, but may not be able to easily determine what mobile or Web apps influence those conversion rates. And if you rely on a point solution that measures only mobile customer experience, say, you might discover that a mobile app is providing a poor customer experience, but may not be able to quickly tie that to the backend on which the mobile app depends.

Second, success depends on the ability to analyze across a range of dimensions. Suppose you run an e-commerce company that sells movie tickets. To track business outcomes, you’d want to track the opportunity cost from failed transactions, by adding Insights custom attributes called ticket_price and transaction_status to your Transaction events and running this NRQL query:

SELECT sum(ticket_price) as 'Opportunity Cost' FROM Transaction WHERE appName='Purchase Service' AND transaction_status ='failure'

That query would display your opportunity cost, in real time, down to the penny:

opportunity cost

To measure customer experience, you’d want to add another custom attribute, mobile_OS, then count how many failed transactions your app experienced for each mobile OS, using the following NRQL query:

SELECT count(*) as 'Failed Transactions' FROM Transaction FACET mobile_OS WHERE appName='Purchase Service' AND transaction_status ='failure'

That would display your failure count for each mobile platform:

failed transactions

Finally, to measure application performance, you’d want to want to add yet another custom attribute called transaction_error_code and count the number of failed transactions broken out (faceted) by error code, using this NRQL query:

SELECT count(transaction_error_code) FROM Transaction WHERE appName='Purchase Service' AND transaction_status ='failure' FACET transaction_error_code TIMESERIES AUTO

That would tally failed transactions by the error code returned by your payment gateway, like this:

failed transactions chart

This chart would provide a developer that understands those codes with valuable clues about what is going wrong.

Finally, it requires the ability to do iterative exploration in real time to detect patterns. In our ticket-company example, you might analyze purchase errors by various combinations of mobile OS, browser, geography, carrier/network, and payment provider. Exploring all the various combinations across five dimensions means running a lot of queries. If each query takes a long time to run, your company will continue to bleed revenue as you try to determine where the problem might lie. The New Relic Software Analytics Cloud is designed to typically return query results in less than a second, making iterative exploration fast and rewarding.

Bulldozing haystacks

Given the complexity of modern software architectures, with literally hundreds of app instances, hosts, and containers, finding the root cause of application problems isn’t like finding a needle in a haystack. It’s more like finding a needle in hundreds of haystacks. Being able to iteratively explore unified software data across multiple dimensions can let you quickly rule out hundreds of haystacks, letting you find and focus on the right one for your root cause analysis.

View posts by .

Interested in writing for New Relic Blog? Send us a pitch!