Running New Relic agents and viewing the results in our dashboard allows you to monitor page response time as well as drill down into web transactions. Each of our language agents targets execution paths that are common for applications running in that language. For example, our .NET agent targets the ASP.NET pipeline to identify when web transactions occur, and the database provider libraries to identify database activity. As the figure below illustrates, the overall performance of the application and the detailed web transactions occurring are easily gleaned from the main dashboard view.
Recently, the .NET agent team was investigating an increase in the performance overhead imposed by the agent on applications. Normally, the agent imposes very little overhead but one of our customers, running web pages that execute in well under a second, started noticing an increase in the response time over their baseline.
We were able to reproduce the increased overhead under particular conditions but were not able to identify the code change that might have introduced it. While this investigation was going on, we were finishing up the Thread Profiling feature in the .NET agent. Thread Profiling is a low-impact profiling feature that can uncover bottlenecks that are not easily highlighted by a slow transaction trace. Why not apply this new feature to get insight on the agent performance itself? I’ll briefly introduce Thread Profiling before showing how we used it to find the problem.
Introducing Thread Profiling
A thread profile is triggered on demand by a New Relic user. You start a thread profiling session by selecting Events –> Thread Profiler from the dashboard’s menu, indicating the duration, from 2 to 10 minutes:
New Relic’s agent retrieves a stack snapshot of managed threads on a regular interval. When the profiling duration has elapsed, the data is sent to the New Relic server, where it is aggregated and presented in the form of a function call tree, as shown below.
A thread profile is a statistical sampling of application activity at the class and method call level. As each back trace is captured, it is entered into the tree so that a percentage of back trace samples in which each call path appeared can be calculated. The percentages are color-coded: red represents greater than or equal to 30%, yellow greater than or equal to 10% and black less than 10%.
Finding an Agent Performance Problem
As I mentioned earlier, we were trying to track down the overhead that the .NET agent added to a customer’s web application. We could reproduce the overhead in the lab but there have been many improvements and new features added to the agent, and it was difficult to narrow down where the performance lag was coming from.
After running a thread profiling session for five minutes and reviewing the results, however, we were able to identify the problem. According to the profile shown above, NewRelic.Agent.Core.Modules.RumHeaderFilter.Write is called 64% of sampling time, and the .NET Framework’s Regex.IsMatch 40% of the time. When we looked at the source, it was obvious what the problem was! We had a regular expression object, i.e., a instance of System.Text.RegularExpressions.Regex, created and regex-compiled for each web page into which we might inject our Real User Monitoring (RUM) monitoring code! The solution was to create a static instance of the Regex object and re-use it, since the regular expression never changed.
After making the code change and re-running the thread profile, our RumHeaderFilter.Write no longer appeared in the statistical profile because it occurs so infrequently in relation to the rest of the application and .NET Framework code. Problem solved!
Where Thread Profiling Can Help You
New Relic’s many features provide targeted data that can help you identify bottlenecks in your code. One of the reasons we created statistical thread profiling was to give you additional insight into your application performance.
By default, our agent only generates a detailed Transaction Trace when a web transaction’s response time exceeds a threshold. The default threshold is two seconds. There are times when the average server response time is well below the threshold for a trigger of a Transaction Trace. Even if you lowered the threshold, a Transaction Trace will show you a single transaction, and it is easy to miss the impact of a single call within the trace.
Our agent code was executing very quickly, but the Thread Profile clearly showed that a code path that was executing during a considerable number of samples taken. We also report the number of samples, as shown below. The greater the number of samples, the higher the resolution of the statistical profile.
This represents the perfect use case for Thread Profiling: a web transaction response time increases, or is at an unacceptable level, but the Transaction Trace detail does not point to a specific root cause. Our experience, described earlier, with the regular expression object is a great example of this.
We are really excited to bring the Thread Profiling feature to .NET developers! Please give it try and let us know your experience.