When it comes to monitoring the performance of your software and infrastructure, it’s easy to get caught up in the technical details. But although that’s obviously important, the real value of monitoring lies in how it make your business more successful.
That’s the topic of the latest episode of the New Relic Modern Software Podcast. Special guest Rob Peterson, senior director of business value and strategy at New Relic, joins New Relic Developer Evangelist Tori Wieldt and me to discuss the business value beyond IT of using New Relic, from saving money to boosting revenue.
You can listen to the episode below, get all the episodes automatically by subscribing to the New Relic Modern Software Podcast on iTunes or wherever you get your podcasts, or read on below for a full transcript of our conversation, edited for clarity:
New Relic was the host of the attached forum presented in the embedded podcast. However, the content and views expressed are those of the participants and do not necessarily reflect the views of New Relic. By hosting the podcast, New Relic does not necessarily adopt, guarantee, approve or endorse the information, views or products referenced therein.
Fredric Paul: Before we get started, Rob, can you tell us a little bit about your background and your role at New Relic?
Rob Peterson: My background is heavy-duty finance and accounting. I worked for companies like American Express, eBay, and Quest Communications. In fact, at Quest Communications, I ran a budget of about $1.3 billion. This has given me very deep experience to understand what is the business value of New Relic to our customers.
It’s important to understand that this value lies not just in the IT organization. We’ve got to look elsewhere and understand that there are a lot of things that we’re doing that have big impacts on the business.
Move faster, be more efficient, and generate more revenue
Fredric: That’s a great way to talk about the impact of digital transformation and digital infrastructures, not just on IT budgets but on the corporate bottom line. Can you talk about those challenges a little bit?
Rob: There are actually three key areas: First, operate faster and more efficiently. Second, build revenue from digital transformation projects. And third, find new digital opportunities.
The velocity at which our customers are able to produce new features and new functions will have massive impacts on the amount of headcount it takes to produce output. In addition to that, if I’m a software business and I’m able to roll code out faster—new features, new functions—not only am I more competitive, it becomes a competitive advantage to my customers.
That has a direct impact on revenues. Again, if it’s an internal business where I’m rolling out new features and functions on ERP systems or other applications within the business, those things have an impact on productivity to the constituency consuming that system internally. So those are productivity enhancements as well. It isn’t just that we’re more cost effective internally in the IT organization, but there are downstream impacts as well.
Lost Customer Time
Fredric: In a couple of recent blog posts for New Relic, you’ve come up with some equations to quantify this. Perhaps the most interesting one is something called “Lost Customer Time.” Maybe you could talk about what that is and why it’s so important.
Rob: New Relic is first to market with this particular metric. This is a leading indicator of what’s called a Net Promoter Score, which is to say that a business can predict what that Net Promoter Score is going to look like by looking at the amount of lost customer time that its customers are experiencing. New Relic lets them look at the speed of the transactions moving through their IT ecosystem, times the number of errors that they incur, which equals that Lost Customer Time.
Fredric: That seems so simple.
Rob: I know. And it’s a killer when you consider the fact that no one else is looking at this. We’re measuring things like transaction speed, page load times, error rates, visit duration, satisfaction, all of those things. If you want to measure what matters to a customer, measure how much time they’re spending or losing waiting for you to deliver the technology that you want them to use.
Fredric: So Lost Customer Time is a measure of the time that customers are actually not able to do what they want to do in your applications or website.
Rob: That’s right.
Fredric: And that has measurable impact on the bottom line, yes?
Rob: Absolutely. That translates directly to the next equation: Customers times Spend. Businesses that generate digital revenues are highly dependent on more customers who spend more.
Fredric: That makes sense.
Rob: It’s not rocket science, believe me.
Tori Wieldt: Spending more money equals … happiness … and capitalism!
Rob: Capitalism at its finest! I love that, Tori. The challenge is that if I’m unhappy with the experience, I either, A, will not return or, B, will go to the competition.
Tori: It’s easier and easier to walk away.
Good enough is no longer good enough
Rob: That’s exactly right. If the checkout process is taking me 10, 15 seconds, I think nothing of short-circuiting that checkout process and then go elsewhere, and two clicks later I’m gonna have that item delivered to my front door 24 hours later. It is a very easy scenario.
The point is that it is no longer good enough to be good enough. You have to be better than the competition.
Tori: On the technical side, we see a number of customers who do 170 deploys or thereabouts on a key night, and their MTTR is 62% better or something like that. We have a customer who reduced calls to an external vendor by 85%. It’s great that you’re quantifying that into business value. Common sense tells you that it makes things better, but it’s really nice to know that we can look at it and say how much better and what the numbers are.
Rob: Surveys explain little about the customer experience. So measure, therefore, what matters to that customer. Measure their Lost Customer Time. And remember that keeping a customer costs literally half as much as it does to acquire a new one.
Tori: Churn is key, right?
Rob: Churn becomes key. It has a direct correlation to the customer experience. And we can measure that with Lost Customer Time.
Tori: Yeah, error rates go up. Churn goes up. It’s amazing how those are tied together.
Rob: Who knew?
Fredric: Everybody knew—but no one thinks about it and uses it in their cost calculations.
The right way to look at ROI and TCO
Rob: When I can talk about revenues in the business value conversation, the ROI goes up nearly five X compared to the one or two X when OpEx is the only thing I can look at.
Fredric: It’s important for users to understand that while there may be an ROI that they see directly on the technical benefits, when they add the bigger picture and the business benefits, they’re actually seeing a much higher ROI.
Rob: No question. That’s because we have a 360-degree vista that we can look out across the entire IT ecosystem.
Fredric: The point is that not any monitoring can do this, you really need the full-stack, complete-visibility monitoring that New Relic brings to the table.
Rob: That’s right.
Tori: We are starting to see the numbers in terms of the “flywheel effect.” If you have fewer errors, then people who aren’t busy firefighting can actually start working on new features or optimizing your code pipeline or reducing toil—and that just keeps accelerating on itself.
Fredric: Tori, you know I love cool jargon. You have to explain the “flywheel effect.”
Tori: The less time I’m spending firefighting, the more time I can optimize my code pipeline, therefore, my developers are more productive, and then I can have a better customer experience. We have less churn. It’s a virtuous cycle.
Fredric: That makes a lot more sense than the hamster wheel I was picturing in my mind.
Tori: No, that’s pre-DevOps.
How to measure DevOps success
Fredric: Oh, great, you’ve made my DevOps transition for me.
Rob, you do a great job of coming up with these really concise equations to measure these things. Let’s apply that to DevOps for a second. Many metrics that people use to predict DevOps success can be super-complicated and kind of fuzzy. But you’ve come up with a really simple one.
Rob: The way that we’re measuring that is in terms of the impact on headcount. For example, bug remediation, outages, incidences, those kinds of things all have a severe degradation to the amount of money that customers have to spend on other things. So, when we take Volume times Time, that has a direct impact on headcount. When either Volume or Time are improved, resources are liberated to scale your digital business.
Even better, the impacts are not one-to-one, they’re one-to-many. So if I can reduce the number of issues, that’s great, but New Relic is also helping you find issues before the customer ever sees them. That becomes a multiplier event.
Fredric: In your blog posts, you mention that one way to track how well you’re doing is a simple measure of frequency of software releases as a proxy for DevOps success. How does that work?
Rob: It is not uncommon for our customers to see a 100% increase in the number of code rolls that they can make during an existing dev cycle. In fact, I’ve even seen in some cases at 2 or 3 X improvement. Again, that means I need fewer people to achieve the same output.
Fredric: What about TCO? How does the New Relic’s multi-tenant Software-as-a-Service (SaaS) approach lead to a better TCO?
Rob: I think this is a key area for SaaS. Basically, for every half of a full-time-employee (FTE) that you need to manage New Relic, you’d need two FTEs for an on-premise installation. The other important distinction is that a legitimate multi-tenant solution differs significantly from a single-tenant solution that’s just hosted in the cloud.
Fredric: Why should that matter?
Rob: Where you locate that single-tenant installation doesn’t matter. The complexity of running it is the thing.
Tori: You move it to the cloud and you get a hell of a bill and you’re like, “Wait a minute. Why are people so excited about this cloud stuff?”
Rob: On-premise vendors like to say, “We’ll maintain it.” But it just isn’t real. The math never comes out the way they think it does. It typically takes 24 to 48 hours to implement New Relic compared to weeks or sometimes months for doing a single-tenant or on-premise installation. It still takes the same amount of time because you’re installing the same system.
Fredric: Right. It’s just a server somewhere else.
Rob: That’s all it is.
Avoiding the 4-Second Poverty Line
Fredric: We’ve been talking about saving money here for a little bit; let’s talk about the revenue side of the equation. One of the things you talk about is how performance can affect e-commerce revenue.
Rob: There’s a couple of things going on here. First, remember that load times matter. We talked earlier that Lost Customer Time has a detrimental impact on customer satisfaction with your digital products. The same thing happens in a media site or an e-commerce platform. There’s a term in the industry called the “4-Second Poverty Line.” Most ecommerce platforms know that they will lose a minimum of 80% of their convertible opportunities when the load time exceeds four seconds.
Fredric: Is that a magic number, not just a stake in the ground? There’s a drop-off at that point?
Rob: Absolutely. The four-second poverty line—that distinguished number—was established back in 2012.
Tori: Still holding?
Rob: It’s getting worse. The poverty line has actually dropped down to three seconds.
Tori: Ah, it’s Roger Banister all over again, right?
Rob: That’s right, the 4-minute mile. They said it couldn’t be broken.
The point is that it’s no longer good enough to be good enough. You can’t be as good as the competition and expect to drive more business into your website. It isn’t gonna happen. You have to be better than the competition. So if three seconds is what the competition is delivering, if you’re an e-commerce platform, then you better be sub-three-seconds, flawlessly, every single time your customers are interacting with your website.
What I typically see for most of our customers, the conversion rates for unique visitors is typically about 1% to 1.5%. So when we talk about a 0.5% delta, that’s significant in terms of the amount of money they’re generating. Every second quite literally counts. Time is money.
Fredric: So, even if your site’s not broken, if you’re having even the slightest performance issue, it’s costing you real money every second.
Rob: You are spot on. I’ve asked the question of our ecommerce customers: “Is it possible that you can show a 100% uptime and availability and still be leaking out revenues?” And they’ll nod and say, “Yeah, of course.” But when I ask them “Why?” I get blank stares because they don’t understand it. And it’s not because they’re dumb. The issue is they’ve never measured anything beyond uptime and availability.
Tori: They don’t know how to quantify it.
Rob: They don’t measure it. So here’s another equation: Speed times Capacity equals what I call the Velocity of Revenue. Another term for a New Relic dashboard is revenue throughput monitoring.
Tori: You can watch the dollars drop out.
Rob: You can watch the dollars go up and down. Is it possible to lose money during Black Friday 3:00 to 4:00 in the afternoon on your e-commerce platform? And the answer is absolutely. And it isn’t because it’s down. It’s because you’re literally blocking transactions from moving through because either, a) the capacity is too narrow or b) the speed at which you are serving up those transactions is too slow. Simple as that.
Fredric: You don’t know that you’re leaving money on the table.
Rob: You have no idea. No idea that you’re blocking transactions or that your abandonment rates are all of a sudden going through the roof.
The tyranny of mobile-app star ratings
Tori: The numbers are the same with mobile apps, right? People won’t even look at apps that are under four stars. If you’re not up in the top echelon, don’t bother. So your user experience, how quickly it loads … if I see a spinning wheel when I’m on my phone, I’m likely to abandon what I’m doing and give a bad rating.
Fredric: Rob, do you have numbers on how performance affects mobile ratings?
Rob: Get this, when the star rating drops below four stars, 50%—not 20%, not 30%, not 10%, half—of your potential customers will not download the app. So, performance? You better believe it!
Fredric: People give the low ratings for bad performance. Even if the actual app does what they want, if performance is bad, they don’t like it.
Tori: They never know. They try it once and …
Rob: Yeah, that’s exactly right. An airline that we were working with saw their star rating sitting at sub-three stars, and they went up a full star within six months post-deployment with New Relic. That’s a big, big move.
Surfing the Monitoring Maturity Curve
Fredric: Let’s put all this together. You’ve talked about something called “monitoring maturity” that rolls up all these factors. What is that concept?
Rob: Most companies are still down and to the left on the Monitoring Maturity Curve. That means they’re using APM as little more than an insurance policy. They’re buying insurance that says, “When I have an issue, I’m going to go to the monitoring systems and I’m going to look for the problems there.”Not only is that “immature” in terms of how they’re using it, but all the value that we’ve been talking about is being left on the table. That suggests their IT organization is probably not aligned as well as it needs to be with the revenue-generating business that it is supporting.
There’s more alignment as you move up and to the right on the Monitoring Maturity Curve. That’s when you start looking at conversion rates. You start looking at revenue impacts for load time, star ratings on mobile apps … those kinds of things. Which is moving beyond the insurance product and actually using monitoring as a strategic resource and a competitive advantage. You’re saying. “I’m no longer just good enough. I’m starting to become better than my competition.”