Dealing with web application errors is normally a huge headache. To help ease this burden, New Relic recently introduced an Error Tracking feature in RPM called Error Tracking. Here’s how it works, if you are a Silver or Gold customer, RPM hooks into the ActionController#rescue_action method. We then record information about every uncaught Exception. This is forwarded to the RPM site, where we organize errors by Exception class.
In the main dashboard page, there is a column for Errors. We show the error rate per minute. Depending on the application, having anywhere from 1-30 errors per minute could be acceptable. Acceptable errors for us are typically ActionController::RoutingError from spam-bots.
Here’s how we use the feature. First, we set a threshold value so that the dashboard shows yellow/red if the rate is higher than we expect. Because we set a threshold, the RPM Incident feature will email us when a violation occurs, which is nice because we don’t have to be looking at RPM to get notified of the problem. Second, every morning I sign in and do a 24 hour query on the Errors page. I get a nice summary of the errors that occurred and I can easily spot-check for serious ones.
Here’s a real-world example of the Error Tracking feature in action. First, I received this email from RPM.
Two things caught my attention. First that my configured threshold was violated, and second that the incident heuristic detected an abnormal jump. I switched over to RPM and saw the following screen.
Yikes! Over 200 errors per minute. Clicking on the red light, I was taken to the Errors view.
The Runt:: error is one that we are aware of and have a pending Agent fix for. The other error is new. Flipping back to the overview dashboard, I noticed that this error was happening on only one host. Clicking on the Count field in the Error view showed all 219 errors and I noticed they all were coming from a single instance of my app. The error itself was not very interesting. The message was:
Expected /data/rpm/releases/20081021033815/app/controllers/agent_listener_controller.rb to define AgentListenerController
This error normally means there was a compilation bug. So, my theory was that this was transient. Sure enough, I restarted just that mongrel and the problem went away. Without RPM giving us rapid and deep insight into this issue, it’s hard to imagine how long it would have taken me to figure it out.