Alerting updates are coming, and they’ll help you gain better visibility into the status and health of your Applications, Key Transactions and Servers monitored by New Relic.
Over the next few weeks we will be rolling out the ability to choose from 3 different notification levels on your notification channels — letting you control how many alerts are sent to that channel. Based on your feedback, we are also adjusting the default notification level to send notifications anytime an alertable turns red in a policy. In the context of Alert Policies, each monitored Application, Key Transaction, and Server is called an alertable.
New Relic’s customers span a huge spectrum of sizes, shapes, and ops needs — for some, New Relic is the one-stop-shop and we are the only dev, dev/ops or ops interface in the building.
For others, New Relic is still the de-facto root cause analysis platform, but also part of an established operations infrastructure. New Relic alerts are just a piece of the puzzle, working in tandem with complimentary tools and sending alerts to a centralized event/incident management tool. This is what we had in mind with our recent updates, giving everyone more control over the verbosity of alerts we send under our Alert Policy System.
Here’s a look at how it works:
The following notification levels can now be applied to Webhooks, HipChat, PagerDuty, Campfire and Email alert channels.
All critical events (new) — send all critical events for all alertables in the policy.
If you are using an external event management system (PagerDuty, xMatters, VictorOps, OpsGenie) this is likely the setting you want.
First critical event (default) — send first critical event for each alertable in the policy. All downtime events are sent as well.
When anything turns red in your policy, we’ll let you know. This is a great place to start with alerting because only the first alert for each alertable triggers a notification, meaning you won’t be overwhelmed with alerts while responding to problems.
Downtime events — send application and server down events only.
New Relic currently has two methods to measure ‘down’ events: 1) Application down, which is triggered by our availability monitoring (pinging) of a supplied application url. 2) Server-not-responding events, which are triggered when a server running our monitoring agent is unable to send metrics to New Relic.
Only receiving down events can be useful in some cases. Maybe management only wants to know when something is down, or maybe you have a separate escalation policy in your on-call system and respond differently to down vs. critical events. Either way, the the flexibility is there and you can set it up however you want.
Do you use the PagerDuty integration in New Relic?
We’ve been working closely with our friends over at PagerDuty and are excited to announce that with the new Notification Levels feature we updated our PagerDuty alert channel to match the expected incident granularity in PagerDuty. For each critical event triggered in New Relic, a new incident will be created in PagerDuty. This new behavior will take effect automatically, and you can learn more about these changes here.
We hope Notification Levels makes your ops life easier, and we’d love to hear your feedback.