OpsGenie

This is a guest post written by Berkay Mollamustafaoglu, Founder of OpsGenie, OpsGenie provides SaaS based alert notifications and mobile response capabilities. The OpsGenie New Relic integration enables users to stay connected when they are remote or mobile. Using OpsGenie smartphone apps, users can control how and when they should be notified.

For many of us involved in operations, alerts we receive are the face of the management systems. We often interact with management systems through the alerts they generate, rather than their user interfaces. Nielsen & Norman state that "user experience" encompasses all aspects of the end user's interaction with the services, and not only the user interface (UI). "Alert design" typically entails the content of the alert, level of obtrusiveness required, how the alert should be routed, how and when the recipients should be notified, and what the potential actions are that recipients can take.

At OpsGenie, we believe that good alert design is essential to maximize the value we get from management systems. New Relic's new "alerting capabilities" announced at FutureStack is a big leap forward in good alert design. The new alerting capabilities provide not only better content but also more control over when alerts are generated. By integrating with OpsGenie, New Relic customers are empowered to further enhance the experience of the alert recipients.

Some of the capabilities OpsGenie adds to New Relic include:

    • On-call schedules with weekly, daily, and custom rotations
    • Multiple notification methods: SMS, phone call, mobile push and email notifications to ensure notification delivery
    • Automated and ad hoc escalations to ensure critical alerts don't fall through the cracks
    • Notification rules that empower users to control how they are notified for different alerts at different times of the day
    • Tracking throughout the alert life-cycle: who was notified, when, and how
    • Ability to respond to alerts and collaborate with others from mobile apps

Integration: Webhooks to the rescue

Webhooks are emerging as the de facto integration method for web services. New Relic's support for webhooks as an alert destination makes it easy to forward New Relic alerts to external systems over the web. New Relic webhooks are real-time HTTP(S) POSTs with JSON payload that New Relic sends to a callback URL, and OpsGenie has a (JSON/HTTPS) web API. Taking advantage of New Relic webhooks and the OpsGenie web API, we were able to implement the integration pretty quickly.

On the OpsGenie side, we've implemented an endpoint that receives the webhook requests from New Relic, and creates/closes alerts in OpsGenie. For mutual customers, using this integration, sending New Relic alerts to OpsGenie is straightforward: create a new webhook channel and provide the OpsGenie endpoint as the webhook URL.

OpsGenie Webhook

Once the webhook channel is created, it can then be used as an alert channel either directly or as part of a notification group. New Relic's alert policies allow grouping applications, key transactions, and servers, as well as applying them different policies. OpsGenie's API endpoint supports parameters as part of the URL. New Relic users can create different webhook channels to route alerts to different people (on-call schedules, escalation policies, etc.) by specifying the recipients parameter as part of the webhook URL.

New Relic Alert Policy

OpsGenie has also provided additional controls to further improve alert signal to noise ratio. We believe that while it's vital that critical alerts are not missed, it's equally important to ensure that users are not disrupted unnecessarily. Mindful of the fact that attention is a scarce resource, OpsGenie provides a range of features to prevent alert fatigue.

OpsGenie Controls

For example, using OpsGenie Automations alert notifications can be suppressed or delayed (such as during deployments). OpsGenie gives alerts recipients full control over notifications. Users can define rules for how they are notified for different types of alerts at different times of the day, and decide whether they'd like to be notified when an alert is acknowledged or closed, or instead when their on-call duty is starting, and so on. Users can turn notification rules or methods on and off, or mute notifications temporarily to avoid getting bombarded with them while they're working on issues.

New Relic alert policies are a major step forward in giving users better alerts. Using New Relic and OpsGenie together, it's now feasible to design even better alerts and improve the experience of alert recipients. Remember, sending an alert that just says "you have an alert" is very hollow - let's move beyond waking users up to empowering them to solve problems with minimal disruption.