Recently our vp of engineering Jim Gochee posted a blog about our successful effort to improve the performance of our flagship product New Relic RPM. Reading how Jim and team used Apdex as a key metric for measuring the before-and-after effects of the optimization effort got me to thinking just how important Apdex has become to our business over the past year.
Way back when we initially introduced RPM as the first SaaS-based application performance management tool, we got a lot of great feedback from our customers in the Ruby community. (We only supported Ruby then; we now also support Java.) Ruby developers are such a shy and retiring bunch that we practically had to pry their opinions out of them. (Ok, not really.) One of the comments we got went something like “you know, measuring average response time and throughput is great, but averages sometime hide the fact that some of my transactions are really slow and affect customer experience. You ought to measure Apdex scores.” We looked into it, liked what we saw, and said “You’re right, we should.” Fast forward to now, and Apdex has become an essential component of RPM and a critical metric for measuring our service levels and business goals.
One of the things that makes Apdex so compelling is it’s simplicity. It’s easy to measure, it’s an easy number to look at, and yet sufficiently deep to give a realistic indicator of our service levels. Apdex becomes really meaningful when you look at the different levels or “zones” of application responsiveness: Satisfactory, Tolerating, Frustrating and then apply those to your own application. For example, with Jim’s RPM optimization effort mentioned above, you can see from the SLA report how Apdex helped us benchmark the responsiveness of our app:
For the week of 11/24, average response time had changed only slightly, but 6% of our transactions were Frustrating. Three weeks later we had cut that in half. At the same time we raised the number of transactions in the Satisfactory range by 15 percentage points. Average throughput and response times are important metrics but need to be combined with the “bucketing” scheme of Apdex to reveal the real user experience.
On a daily basis, we use Apdex to monitor whether specific transaction types are meeting SLAs. Keeping our application, RPM, well performing is not an easy task considering that we collect more than two billion metrics per day. We provide real-time data to our user base, which means that every page we serve up is a unique request. So, some traditional techniques like caching aren’t an option for us. Our application overview page offers a great example of the type of unique page that we have to serve up to hundreds of customers every day.
This page is the starting point for our customers’ monitoring efforts–meaning that they use it daily, hourly, or more. It’s open all the time and its constantly updating metrics. If we don’t deliver this page fast, we start to hear from our customers. And there’s a lot going on here. It’s no trivial matter to render graphs and tables of performance metrics in a second or less. We set a target Apdex score of .9 for the page–very challenging, but critical for maintaining customer satisfaction. Apdex gives us an objective measurement by which we can be confident we are meeting user expectations for service delivery.
But how is Apdex a core business metric for us? Well, as the guy that gets to report to the Board every month on the health of the business, I live in Apdex. For a web-based business, application performance is about insuring that the product you sell (a web transaction) is of high quality. It’s like the guy who sells apples making sure each one is red, shiny and delicious. We designed RPM’s SLA report to be a vehicle for showing how the business is progressing. That report, which includes Apdex, is how I communicate to our shareholders. Of course, we pay attention to metrics like number of signups and RPM usage, but the inclusion of Apdex makes it really meaningful because we can show the direct impact that customer satisfaction is having on business growth. Let me discuss this a little more in the context of some recent research.
Last year at the Velocity conference, Google and Microsoft gave a presentation called “The User and Business Impact of Server Delays, Additional Bytes, and HTTP Chunking in Web Search.” In it, they describe a very real connection between web performance and business growth. Both Google and Microsoft regularly test web performance by taking random samples of users and purposely slowing their transaction response times. The results are astonishing. A two-second delay caused revenue to drop by 4.3%. Even delays under .5 seconds had negative revenue impacts. These delays were shown to have persistent and increasing negative effects over time (such as increasing abandonment rate). Another experiment showed that when the delays were eliminated, it took weeks for the test group’s behavior to return to normal. This means that revenue continued to be impacted even though performance was back to acceptable levels. We take metrics like these very seriously and that’s why we set a very high goal for service delivery.
In short, Apdex has become a critical benchmark for measuring and forecasting business growth, customer satisfaction, and both the short- and long-term performance of our application. If you are an RPM customer, I urge you to add Apdex to the key metrics for managing your business. I think you’ll appreciate how it can help your business (and your customers will appreciate it too!).
If you have any questions about how we use Apdex, feel free to drop me a note email@example.com.
For more information about Apdex, please visit Apdex.org, where you find detailed descriptions of how it works and how you can use it to effectively measure customer satisfaction.
-Lew Cirne, New Relic CEO