Why server monitoring is never enough

It’s almost 2011. What is a server, anyway? We’ve got app servers, database servers, file servers, name servers, proxy servers, web servers and more. These severs can reside on a physical machine directly or within virtual servers. Servers rely on internet connectivity, network speed, uninterrupted electric power, heating, cooling, ventilation…

A cheeky modern definition of infrastructure: Someone else’s problem

If the explosion of cloud hosting providers has shown us anything, it’s that many companies are eager to shed the maintenance and responsibility of physical infrastructure in return for on-demand capacity on a reliable platform. While many other apps are still hosted in privately run data-centers, it’s not uncommon for those servers to be managed by a separate operations group. Managing infrastructure, both in the datacenter and in the cloud, is just about the last thing a business or application developer wants to have to think about.

Apps are more than the sum of their parts

Web applications can build communities, engage users and sell products. By democratizing communication and removing barriers, web applications can change the world. Web applications produce value. They are much more than the sum of their servers parts. Web developers and application users share the same desires. They want the service to be great, and they want it to be fast. I care if I can log into GMail and send an email quickly. I don’t care if one of the hard drives on one of the servers in one of the clusters in one of the data centers is getting full.

Manage Value

Only by focusing on the value of a system, instead of just its infrastructure, can we ensure our customers stay happy. How quickly can I see a list of my friends that are online inside your social app? How quickly can I buy something from your eCommerce site? How long does that search feature take to respond? That is the value we need to be managing.

Actionable Management

When you only monitor servers, you get pretty useless data without any context. “Warning: High CPU utilization on prod-db-2.example.com” What do you do about that? Who is even going to be alerted? Do your developers know about that spike in CPU? What action can they possibly take? What is consuming those extra resources? When did it start happening? Did a recent deploy effect this change in load? Is a new app feature consuming these extra resources? Ahhhh!

When you only monitor servers, the signal-to-noise ratio can be very poor. When you monitor your applications from the inside out, you’ll get the answer to each of those questions and more.

High CPU Load on the Database, in two acts

Server Monitoring Only:

Alert: High CPU Utilization on prod-db-2.example.com

Ops: OMG! The database server CPU just spiked way up! Someone make it stop! Oh, the horror!
Dev: Um, what?
Ops: database server… CPU… high… the processes… the disk… everything… normal?
Dev: Hey, I’m going to grab some coffee, see you in a bit.
Ops: OK, maybe the load is related to traffic or something… oh well… I’m sure the site is fine

Application Monitoring:

Alert: Apdex fell to 0.78 [1.0]* over the last 1 minute, below the caution threshold of 0.85
Dev/Ops: Hmm… the site just slowed down a bit, I wonder what’s up?
Dev/Ops: New Relic RPM shows a spike in response time at the database tier right after our last deploy
Dev/Ops: That new query we added into the detail page is taking the most amount of time
Dev/Ops: That page alone is hit 142 times a minute, and it’s triple our site’s average response time!
Dev/Ops: According to this transaction trace, that query isn’t using the indexes properly.
Dev/Ops: We’ll work on a fix right away and have it ready to deploy soon

Manage your apps, not just your servers

Want to see how your applications are performing in production from the inside out?
Try New Relic RPM right now for free!

brian@emphaticsolutions.com'

Marketing at Github View posts by .

Interested in writing for New Relic Blog? Send us a pitch!