How to use PagerDuty with RPM's Availability Monitor

PagerDuty_New RelicIn this post our very own Bill Kayser, lead engineer for the Availability Monitoring feature, offers some great tips for getting the most out of RPM and PagerDuty. At New Relic we’ve been using PagerDuty to enhance the notification and workflow features of our own alerting system. We’ve found it especially useful with the new Availability Monitoring feature. It’s simple to configure, inexpensive, and gives you great flexibility in setting up notifications when your application is down.

PagerDuty allows you to set up per-user contact information and preferences, including SMS, e-mail and even telephone notifications.  You can set up an on-call rotation with escalation policies.  It has a simple but effective workflow for incidents.  All you just need is a way of generating an incident, which is where RPM comes in. We use a custom e-mail address provided by PagerDuty as the notification e-mail in our availability monitoring preference:

avail_monitor_email_setup

Whenever the pinger detects a failure in an application, it sends the notification e-mail to PagerDuty which opens an incident and starts the workflow. This usually this means that one of us gets paged with a description of the incident: “https://www.newrelic.com is failing with an expired SSL certificate.” Because our notification e-mails have a mime text attachment with a very short link, the SMS messages display the abbreviated downtime information.

In addition to the simple integration available in the Availability Monitoring configuration, you can use PagerDuty to generate your own alerts from inside your application. We use it from some of our Java applications. Using their REST API you not only trigger events but you can also acknowledge them and resolve them programmatically if the incident clears up before manual intervention. It looks like something like this:

api = new PagerDutyAPI(ourCustomAPIKey);
incident = new Incident("Running out of memory on "+app_name+"!");
api.trigger(incident);

// later, if things improve:
incident.addDetail("Recovered at", new Date().toString());
api.resolve(incident);

I’ve made my PagerDuty Java helper classes available on Github. As always, we’d love to hear your feedback. Drop us a line any time at support@newrelic.com.

– Bill Kayser

leigh@newrelic.com'

Marketing Manager, Content View posts by .

Interested in writing for New Relic Blog? Send us a pitch!