Because your web application is running on an unreliable network and being accessed by unpredictable people, your web app’s performance is not likely to show up as a perfectly straight line on your charts in New Relic. When you see a spike in your response time, error rate, or other vital metric, how should you figure out exactly what caused the spike? You can now zoom in on spikes just by dragging over them on a chart.
I’ll walk you through how you can use it to find what you need to optimize or fix. Let’s see what was happening over 6 hours yesterday in New Relic monitoring New Relic:
That spike in the center looks suspicious, so let’s take a closer look. Just drag on the chart and select it:
New Relic will then zoom in on that time period for you, updating the whole page:
Indeed, there’s something more going on there. I think that this was caused by a few slower transactions dragging up the response time average, so let’s take an even closer look. Zooming in again, I see that there wasn’t a corresponding bump in throughput, so let’s take a look at the transaction traces that New Relic has collected. If there are some seriously long-running ones, we’ll know that this spike wasn’t just a fluke.
It looks like #metric_regex is the prime culprit. Excuse me while I add a ticket to our soon-to-be-networked help desk.