How we provide real user monitoring: A quick technical review

This week New Relic released a whack of new and cool stuff. One of the coolest of the new is Real User Monitoring (RUM). Based on the JavaScript library created by Google’s Steve Souders, RUM shows you what performance issues your actual users are experiencing on your site, right now. Finally you can see beyond the app tier. After all, even when things are performing fine on the server, who knows what’s happening on the network and in the browser? New Relic.

What does it measure?

Real User Monitoring details how long pages take to load and where the time is spent – from the moment the user clicks until the page has completed loading. Typically, for each click, a request is sent over the network to a server where it sits in a queue waiting to be processed. Eventually it hits the front of the line and is processed by the app creating a response. The response sent back over the network and is then loaded and rendered in the browser.

End user request timeline

Here’s how New Relic displays page load time for for an app. It’s very similar to the timeline shown above except that network time is combined into the total round trip over the internet.

Browser page load breakdown graph

Page load time is broken down into five components:

  • Request queuing: The wait time between the web server and the application code.
  • Web application: The time spent in the application code.
  • Network: The time it takes for a request to make a round-trip over the internet.
  • DOM Processing: Time spent in the browser parsing and interpreting the HTML.
  • Page Rendering: Time spent in the browser displaying the HTML, running in-line JavaScript and loading images.

The graph makes it easy to see how a spike in one component, like network time, can affect overall page load times.

Page load time is further broken down by geographical location. For example, you can see how page load times vary around the planet.

World map of countries by average page load time

And you can choose up to five countries to explore regional differences. For example you can see what the average network time looks like across the United States.

Average network times in the United States

The data is displayed in tables as well as maps, and you can render the geographical data many ways – by page load time, network time, heaviest users, Apdex, throughput and sessions.

The data is also sliced and diced by the pages on your site. You can easily find the most heavily used pages on your site and drill into their performance. And of course, you can easily link from the end user tier to the app tier for a particular page.

All the gory detail of what happens in the application for a particular request is at your fingertips.

Web page drill down and link to app tier

You can also see what browsers people are using to view your site and how their performance varies.

How does it work?

If you’re already familiar with New Relic’s application monitoring then you probably know about New Relic’s agents that run in-process on web apps collecting and reporting all sorts of details about whats happening in the app. RUM leverages the agents to dynamically inject JavaScript into pages as they are built. The injected JavaScript collects timing information in the browser and contains details that identify the specific app and web transaction processed on the backend as well as how time was spent in the app for each request. When a page completes loading in an end user’s browser, the information is sent back to New Relic asynchronously – so it doesn’t effect page load time.

You can turn RUM on/off via your application settings in New Relic. As well, you can turn RUM on/off via the agent’s configuration file (newrelic.yml – a ‘browser_monitoring auto_instrument’ flag has been introduced).

The agents have been enhanced to automatically inject JavaScript into the HTML pages so using RUM is as simple as checking the checkbox on the New Relic control panel. However, if you’d prefer more control, you can use New Relic’s Agent API to generate the JavaScript and thus control exactly when and where the header and footer scripts are included.

Time spent in the browser is known as the “front end”, whereas time spent on the network and in the app are known as the “back end”. The JavaScript header is the first snippet executed as the page loads. It records a “first byte time” and then asynchronously loads a small JavaScript library that gets inserted into the DOM and is cached by the browser. The library listens for the “DOMContentLoaded” and page “load” events in order to determine how much time is spent processing the DOM and rendering the page respectively, relative to the first byte time. The footer is executed last and simply provides context for measures being reported – application identifier, an API license key identifying the account, how much time was spent in the queue and in the app, as well as an obfuscated web transaction name. The agent obfuscates transaction names so that your application implementation details are not revealed in the page’s source code.

In order to determine how much time is spent on the backend we have to know when a navigation began (relative to the first byte time recorded by the header). This time is determined differently depending on the API a browser supports. The Web Timing API, a new standard supported by browsers such as Chrome and IE9, provides navigation start time. In other browsers a cookie is used to record the navigation start time as a page is hidden or unloaded via the “pagehidden” and “beforeunload” events respectively – whichever comes first or is supported by the browser. As a user moves from one page to the next times are recorded: navigation start, first byte, DOM load, and page load. Since the JavaScript contains app and queue times measured by the agent, network time can be computed as the difference on the backend.

How do we do it?

New Relic operates a set of highly efficient data collectors. The collectors preprocess and aggregate browser metrics before sending them along to our analytics system for presentation. We’ve chosen a hybrid cloud data model to allow us to scale with demand. Currently, the collectors are performing with a 0.16 millisecond response time (i.e. to process each ping back from an end user browser) and we’re only consuming 5% of our server’s capabilities. This model will allow us to scale as demand grows. Just think – each time one of our customers gets a hit – so does New Relic.

RUM is also integrated with New Relic’s Apdex score – you can define a page load target time for your app to get a better view into your customer’s satisfaction. End user details are included in all of New Relic’s standard reports and emails. RUM is also integrated with New Relic’s alerting system so you can be notified when end user performance degrades – even when application performance does not.

What next?

It’s included at all price levels. If you don’t already use New Relic, you can always try it out for free with our “two minutes to first light” install. You’ll get insight into how long users are waiting for page loads, where they’re coming from, and if they are having different experiences based on where they are or what they’re using.'

Darin is one of the principal RUM engineers at New Relic. You might know him better for his work on Eclipse. He's spent the majority of his career developing software tools but has also delved in audio software at the Banff Centre. View posts by .

Interested in writing for New Relic Blog? Send us a pitch!