How we provide real user monitoring: A quick technical review
This week New Relic released a whack of new and cool stuff. One of the coolest of the new is Real User Monitoring (RUM). Based on the JavaScript library created by Google’s Steve Souders, RUM shows you what performance issues your actual users are experiencing on your site, right now. Finally you can see beyond the app tier. After all, even when things are performing fine on the server, who knows what’s happening on the network and in the browser? New Relic.
What does it measure?
Real User Monitoring details how long pages take to load and where the time is spent – from the moment the user clicks until the page has completed loading. Typically, for each click, a request is sent over the network to a server where it sits in a queue waiting to be processed. Eventually it hits the front of the line and is processed by the app creating a response. The response sent back over the network and is then loaded and rendered in the browser.
Here’s how New Relic displays page load time for for an app. It’s very similar to the timeline shown above except that network time is combined into the total round trip over the internet.
Page load time is broken down into five components:
-
Request queuing: The wait time between the web server and the application code.
-
Web application: The time spent in the application code.
-
Network: The time it takes for a request to make a round-trip over the internet.
-
DOM Processing: Time spent in the browser parsing and interpreting the HTML.
-
Page Rendering: Time spent in the browser displaying the HTML, running in-line JavaScript and loading images.
The graph makes it easy to see how a spike in one component, like network time, can affect overall page load times.
Page load time is further broken down by geographical location. For example, you can see how page load times vary around the planet.


And you can choose up to five countries to explore regional differences. For example you can see what the average network time looks like across the United States.
The data is displayed in tables as well as maps, and you can render the geographical data many ways – by page load time, network time, heaviest users, Apdex, throughput and sessions.
The data is also sliced and diced by the pages on your site. You can easily find the most heavily used pages on your site and drill into their performance. And of course, you can easily link from the end user tier to the app tier for a particular page.
All the gory detail of what happens in the application for a particular request is at your fingertips.

You can also see what browsers people are using to view your site and how their performance varies.
How does it work?
If you’re already familiar with New Relic’s application monitoring then you probably know about New Relic’s agents that run in-process on web apps collecting and reporting all sorts of details about whats happening in the app. RUM leverages the agents to dynamically inject JavaScript into pages as they are built. The injected JavaScript collects timing information in the browser and contains details that identify the specific app and web transaction processed on the backend as well as how time was spent in the app for each request. When a page completes loading in an end user’s browser, the information is sent back to New Relic asynchronously – so it doesn’t effect page load time.
You can turn RUM on/off via your application settings in New Relic. As well, you can turn RUM on/off via the agent’s configuration file (newrelic.yml – a ‘browser_monitoring auto_instrument’ flag has been introduced).
The agents have been enhanced to automatically inject JavaScript into the HTML pages so using RUM is as simple as checking the checkbox on the New Relic control panel. However, if you’d prefer more control, you can use New Relic’s Agent API to generate the JavaScript and thus control exactly when and where the header and footer scripts are included.
Time spent in the browser is known as the “front end”, whereas time spent on the network and in the app are known as the “back end”. The JavaScript header is the first snippet executed as the page loads. It records a “first byte time” and then asynchronously loads a small JavaScript library that gets inserted into the DOM and is cached by the browser. The library listens for the “DOMContentLoaded” and page “load” events in order to determine how much time is spent processing the DOM and rendering the page respectively, relative to the first byte time. The footer is executed last and simply provides context for measures being reported – application identifier, an API license key identifying the account, how much time was spent in the queue and in the app, as well as an obfuscated web transaction name. The agent obfuscates transaction names so that your application implementation details are not revealed in the page’s source code.
In order to determine how much time is spent on the backend we have to know when a navigation began (relative to the first byte time recorded by the header). This time is determined differently depending on the API a browser supports. The Web Timing API, a new standard supported by browsers such as Chrome and IE9, provides navigation start time. In other browsers a cookie is used to record the navigation start time as a page is hidden or unloaded via the “pagehidden” and “beforeunload” events respectively – whichever comes first or is supported by the browser. As a user moves from one page to the next times are recorded: navigation start, first byte, DOM load, and page load. Since the JavaScript contains app and queue times measured by the agent, network time can be computed as the difference on the backend.
How do we do it?
New Relic operates a set of highly efficient data collectors. The collectors preprocess and aggregate browser metrics before sending them along to our analytics system for presentation. We’ve chosen a hybrid cloud data model to allow us to scale with demand. Currently, the collectors are performing with a 0.16 millisecond response time (i.e. to process each ping back from an end user browser) and we’re only consuming 5% of our server’s capabilities. This model will allow us to scale as demand grows. Just think – each time one of our customers gets a hit – so does New Relic.
RUM is also integrated with New Relic’s Apdex score – you can define a page load target time for your app to get a better view into your customer’s satisfaction. End user details are included in all of New Relic’s standard reports and emails. RUM is also integrated with New Relic’s alerting system so you can be notified when end user performance degrades – even when application performance does not.
What next?
It’s included at all price levels. If you don’t already use New Relic, you can always try it out for free with our “two minutes to first light” install. You’ll get insight into how long users are waiting for page loads, where they’re coming from, and if they are having different experiences based on where they are or what they’re using.
Sign up here. It's free, so why not?







This looks like a great feature. How do I enable latest the newrelic_rpm gem in a Heroku app? I’ve included the gem in the Gemfile but Heroku’s automatic plugin installation seems to override it.
Darin Wright Reply:
May 18th, 2011 at 5:54 pm
There’s a support discussion regarding deployment on a Heroku app. Looks like the discussion was private… but the advice was like this:
The problem you are seeing results from the new gem conflicting with the old plugin you may have installed through plugins. You can put a file in the plugin directory to force heroku not to load the plugin, like so:
#!/bin/bash
mkdir -p vendor/plugins/rpm/
touch vendor/plugins/rpm/foo
git add vendor/plugins/rpm/foo
git commit -m ‘force heroku not to add the plugin’
Sorry for the trouble and I hope this fixes things up for you
Posted: 17 May 2011 at 6:29 am by sam
Does RUM instrument all pages served, or does it instrument a representative sample? Have you measured the performance implications of injecting RUM?
Very exciting development! Very excited to start using it, depending on the performance cost.
Darin Wright Reply:
May 18th, 2011 at 5:52 pm
The intention is that RUM instruments all pages. In practice, auto-RUM might not get all pages – for example, the Java Agent is designed to auto-instrument JSPs. However, in order to gather metrics for all your pages, one would like all pages to be instrumented. The performance impact on the agent to generate and inject JavaScript is negligible – for example, less than 1ms on the Ruby Agent. We believe the overhead of the JavaScript in the browser is also in the noise. Thanks for you’re interest! To date we’ve not heard any complaints about performance penalties – be sure to let us know if you find something.
Posted: 18 May 2011 at 9:12 am by Rob
I have some concerns with the accuracy of the measurements. We just deployed today and I’m seeing a 5 second average across our site. But if our site averaged 5 seconds we would be out of business! Clicking around the site, from any browser, the pages load in subsecond time. Using PingDom, we average 1 second from Europe! Using google analytics we average less than 1.5 seconds across everything. So I’m not sure whats going on but it really seems like there is a problem with the math or the methods somewhere.
One of our pages has a very simple structure, the DOM processing is very minimal, but RPM is showing that it spends 2.5 seconds average in the DOM on this page. That seems incorrect to me.
Is it possible that ajax calls to services, and other asynchronous .js calls (like to google analytics) could be involved in the final page speed measurements?
Darin Wright Reply:
May 20th, 2011 at 12:38 pm
Keep in mind that what RUM measures and what other tools measure may be different – the sample size, where, and what is actually measured. We do have an unsophisticated algorithm for discarding large outlier times so as not to skew the average. One important thing to mention is that we discovered a bug with DOM processing time reported by IE 6/7/8 as those browsers do not support the event we’re using to measure time spent DOM processing. There will be a fix out for this shortly. However, the overall response time does not change – just the breakdown between DOM processing and page rendering (load).
Our measurements stop when the load event is received – we rely on the browser to tell us when that happens. Asynchronous calls can / should happen after the load event.
Posted: 20 May 2011 at 12:17 pm by Tom
I wanted to update my comments.
I have a product suggestion. Add something to the interface that allows an rpm end user to filter out x % of the ‘outliers’. NewRelic support was kind enough to send me a dump of 1000 rows of my raw data. What I found was that by dropping 20 of the 1000 rows, where it was obviously a problem on the end users machine my data looked much more accurate. My average time dropped from 5 seconds to 3. The problem is, 10 users who’s IE6 or windows for workgroups 386 takes 59 seconds to load the page can throw your entire average off by an enormous amount. I’m actually thinking of turning this feature off because it’s so far removed from what is really happening it is almost not useful.
If I could drop the top 5% based on browser time (not server time, I want to keep the server time the same because that is actionable information) I would feel a lot better about the information being reported.
Really, what we, as web admins want to know, is what is the experience like for the vast majority of our users. When a small edge case of browsers almost doubles the average time being reported.
There are other algorithms out there for dropping outliers, based on median, but probably a configurable percentage of web browser time would be the easiest and most straightforward for everyone.
Darin Wright Reply:
May 20th, 2011 at 1:03 pm
Tom – thanks for the great feedback and discussion. I’ve added your suggestion to our list of features to consider.
Posted: 20 May 2011 at 12:45 pm by Tom
What about ajax loaded data? We generate a lot of charts using ajax loads, and my definition of “complete page” would include having all charts loaded. Is it possible or planned to include time for ajax loads in the measurements?
At the same time there is some stuff we want to always exclude. For instance Facebook Connect generates invisible iframes with long running connections (a comet style architecture) – not useful to report.
Darin Wright Reply:
May 24th, 2011 at 6:54 am
We’ve had several requests for reporting of asynchronous load information and custom measures. This is something we’ll be investigating/prioritizing in the near future.
Posted: 23 May 2011 at 6:19 pm by Mark Lanett
The common initialization pattern for jQuery is $(function() { … }) or $(document).ready(function() { … }). Is this counted in page rendering time?
Posted: 24 May 2011 at 11:45 am by Lawrence Wang
Yes – The code called inside a jquery document.ready callback is happening between the DOM ready event and the window load event (which we track). If you perform some jquery logic inside a $(window).load callback, that in-theory is happening after the window is fully rendered and visible to the user… we don’t track that as part of the response time.
Posted: 24 May 2011 at 1:23 pm by Darin Wright
how does the backend time tracking work in the case of a first page visit (for example a user clicks on an external link to our site) in browsers that do not support the Web Timing API?
I guess that the cookie approach can’t work in that case, so…
Darin Wright Reply:
June 27th, 2011 at 10:14 am
Good question – I’ve updated the RUM support article to describe how first hits work when there is no cookie. See: http://support.newrelic.com/help/kb/features/how-does-real-user-monitoring-work
Specifically, see the section titled “When navigation start time is unknown”.
Posted: 27 June 2011 at 10:10 am by Angel