How We Tune Our Own App Using RUM Data

Around here we drink our own champagne and we use New Relic to tune New Relic. We recently set some fairly aggressive performance standards for ourselves by lowering our end user Apdex-T value to 2.4 seconds, which is rather fast. We’re saying that our users expect our pages to load in 2.4 seconds or less. Our Apdex score immediately took a dive below 0.8. We’re currently working on bringing that back up to 0.95 or higher. Here’s how I used our Real User Monitoring (RUM) data to get closer to meeting that goal.

Where do you start?

As web developers, we tend to be really good at thinking about how to tune our server-side code. But our users don’t care about how fast our servers are. They care about how fast their pages load. That’s a challenge for us — we have to start thinking about end-to-end transaction times and work on optimizing in-browser. With RUM data, that’s easy to do, but many developers are unfamiliar with how to start approaching the problem.

When tackling an optimization problem like this, I start on the end user transactions overview. If I’m lucky, I’ll find some low-hanging fruit. Looking at the data for our app, here is what I found:

73% of total page load time spent on one page

It looks that, of all of the time spent loading pages, across all of our users, 73% was spent in one place! That’s some juicy low-hanging fruit.

PublicAccess::ChartsController#show is the page that renders when a user views an embedded chart. Many of our customers love the embedded charts and use them a lot. So I’m not surprised that it’s on this list. But the embedded chart is a very small page. It’s just a chart with a title. How could it be taking that much of the total end user time? And more importantly, how can I tune it?

The drilldown chart to the right shows me where this page is spending its time. The biggest chunk of time is spent in transit over the network which is no surprise. DOM processing time is the second offender. This is true for the aggregate data; let’s see if it applies to the worst offenders.

How bad does it get?
Scrolling down on the transaction drilldown, I get links to some browser traces. These are individual requests that had really bad performance. Let’s take a look at one of them.

Browser trace screenshot

An 8 second page load for a page that consists of a single chart!?! This needs some help. But look at that DOM processing time! It’s over 5 seconds! That’s awful. Let’s see what we can do with it.

What is DOM processing?
In order to tune this transaction, it helps to understand what DOM processing really is. It’s the time spent by the browser parsing the HTML and JavaScript, and applying CSS rules. If you have any script tags in the body of your document, their execution is part of DOM processing. In order to figure out what’s going on with this page, we need to open it up with either Firebug or the Webkit Developer Tools and look at the asset waterfall. Here’s what that looks like for a random embedded chart:

Waterfall Before Tuning

There’s a lot going on in here. Let’s look at the most important points:

1. We’re serving 165KB of minified JavaScript and 29KB of minified CSS. That’s a lot of code!
2. It’s taking 1.59 seconds for the RUM timing to end (the vertical red line, which is the body.onload event.)
3. We’re loading the chart data as part of the RUM time (see the data.json line.)

Now we’ve got some actionable data to use. Let’s start tuning!

Tuning the page
It looks like there are two simple things we can do to tune this page: reducing the size of our JavaScript and CSS, which will reduce download and parsing times, and also deferring the chart data.json call until after the body.onload event.

CSS tuning
On the backend, we use Sass to write our CSS. We’ve got our SCSS source files fairly well split up. At deploy time we generate the full CSS, concatenate the files and minify them into the CSS files that we actually serve. Looking at the CSS bundle we were using for this page, we had already thrown out a bunch of the component files. Our sitewide layout, grid structure and form styles were already gone. But looking at what was left, I found that we had a full copy of the CSS required to style all of the site-wide UI widgets we use. That includes charts, but also things like the recent events list, data tables, and all kinds of other stuff.

Since this page is serving embedded charts, I know that I only need the styles for charts and nothing else. So I pulled out the chart-specific CSS and threw it into another file. Then, the CSS for this page could just use the chart-specific CSS and not anything else. That’s a win.

JavaScript tuning
Now it was time to tackle the JavaScript. Similarly to our CSS, we use Jammit to package up and minify the JavaScript code. I took a look at what was included in the JS bundle for this page, and I found a few things. First, we have a lot of code required to handle the charting system. There’s a full copy of jQuery, a copy of Highcharts.js, some utility functions for handling banner messages, code for embedding Flash (since some of our charts are Flash, like our maps), the JavaScript for the Flash-based charting library we use when our users’ browsers don’t support Highcharts, and then the kicker: a full copy of jQuery UI.

Wow. Our little embedded charts page had all the code necessary for rendering accordion and calendar UI widgets, performing drag and drop operations, and all of the other goodies that jQuery UI provides. That seems like a bit much, doesn’t it?

A little investigation revealed that our wrapper system for Highcharts was a jQuery widget that depended on jQuery UI’s widget generation code. And so we’d included the whole library. In a few minutes I’d built a custom version of jQuery UI that only contained the widget generation code, and swapped that out for the full jQuery UI code.

Event tuning
The last thing to work on is the ordering of events. We’re loading the chart before the body.onload, but it doesn’t really need to happen that early. We can defer it for a bit and users will still feel that the page is snappy. We do what most jQuery users do and wrap all of our JS in the following pattern:

jQuery(function($) {
  // do stuff here, like load a chart

This is shorthand for the following:

jQuery(document).ready(function($) {
  // do stuff here, like load a chart

The business code here is attached to the document.ready event. This is a jQuery-specific event that’s closely tied to the DOMReady event. It fires when the DOM has been loaded. Once all of the JS that’s attached to the document.ready event has run, the browser will fire off the body.onload event, and the RUM timing will stop.

Looking at our chart loading code, we were following this pattern. I changed it around a bit, to this:

jQuery(body).load(function() {
  // do stuff here, like load a chart

This simple change deferred all chart loads, site-wide, until after the body.onload event.

Let’s see what our page load looks like now that the CSS and JavaScript have been tuned, and we’ve changed our event sequencing a bit.

Screenshot of the waterfall chart

This looks a lot better! Once again, let me point out the important features:

1. Our JavaScript is down 67KB (41%) to 98KB, and our CSS is down an order of magnitude (91%) from 26KB to 2.5KB.
2. The chart data load is happening after the body.onload event, so our timing stops earlier.
3. It looks like we’ve shaved a half a second off of the page load time

This is great! But we’re not really sure how this is going to affect us in production yet. Comparing individual page loads, like this waterfall to the previous one, isn’t a great comparison. If you hit refresh a few times you’ll see that page load times can vary quite a bit. So I’m not ready to trust my half-second speed boost yet.

Production effects
The proof, as they say, is in the pudding. Let’s deploy our changes and see what happens. Thankfully at New Relic that happens relatively frequently, so my code was released 24 hours after I made my changes.

First stop: the RUM web page view for our transaction:

Embedded Chart Post Tuning

The two vertical lines are two successive deploys. The first deploy released the optimizations we made. You can ignore the second one — we had a minor regression in the first deploy, unrelated to the optimizations, that we patched.

It looks like we really did shave a half a second off of the page load times! And that half-second came out of the DOM processing time, which is exactly where we expected it to come from. Excellent! So far my expectations have been met.

By constraining the time window to a post-deploy only view, which I’m not showing, I saw that the transaction’s contribution to the total page load time is down from 73%, to 56%. That’s a great win. It’s still 56% though, which is largely due to the number of hits it gets. So there might be some more tuning to be done later. But that’s for later. For now, I want to quantify the effect of the tuning we did.

I’ll bet that our tuning had site-wide effects for two reasons. First, this one transaction is still taking up a significant proportion of the total page load time. Second, when we deferred the chart loading until after the body.onload event, we affected every chart loaded on our site.

Here’s a chart showing our sitewide RUM performance:

Sitewide RUM Post Deploy

The good news: our optimizations had a site-wide effect! It looks like we shaved almost a half-second off of the site-wide page load time. The embedded charts page gets a lot of traffic, so tuning it gave us a lot of bang for the buck. If you look closely at the Apdex chart, you’ll see that we improved our Apdex score by almost 5 points!

All in all, the tuning work took only a couple of hours. I was lucky that we had some very low hanging fruit to work with, so I was able to get a lot of bang for only a little buck. It’s rare that a single web page has such a strong effect on site-wide performance. However even if you don’t have that advantage, you may still be able to make some big strides in tuning your end user performance.

Look for site-wide problems
The biggest wins we had came from removing excess JavaScript and deferring the chart loading. These tuning patterns weren’t all that specific to the one transaction we worked on. They’re patterns that you can apply everywhere. Let’s look at what you can do with your site.

Reduce the amount of code
It’s all too tempting with modern tools like Jammit or Sprockets to package up all of your JavaScript into a single file for download. This can be problematic for several reasons:

1. Too large JavaScript files aren’t cached on mobile devices, so they’ll be loaded on every page request by your slowest users.
2. Even if it is cached, all of the JavaScript has to be processed on each request. Some browsers are more clever about this than others and cache the parsed JavaScript code. But as developers we can’t rely on the browsers to do the right thing. With the explosion of devices and alternatives we’ve seen in the past couple of years, assumptions about how browsers do things are dangerous.

By serving JavaScript conditionally, you’re adding an asset request (or more) to every page load, and you’re decreasing in-browser cache hit rates for your JavaScript assets. But there’s a much bigger win to be had by reducing the amount of JavaScript that needs processing on each page load. The fastest code is code that doesn’t exist. Coincidentally, it also has no bugs.

There are two extremes here — a single, monolithic JavaScript file versus a single file for each page you serve. You’ll have to find the balance between those two, and that will completely depend on your site’s architecture. A single page load Backbone.js app will want a monolithic file; a more traditional web application will want more than that.

Similar arguments apply to your CSS rules. By splitting up your CSS and removing CSS code from each page load, you’re making the application of rules to each and every DOM element that much faster. Lean and mean is the way to go.

Don’t be afraid to re-define what “page loaded” means: it’s about users
The deferral of chart loading to after the body.onload event can be seen as sleight of hand. We simply moved the chart loading time outside of the RUM timer’s domain. It seems like that might be dishonest, like we’re cheating on our numbers, but we’re really not.

The RUM timer stops when the browser has rendered and the user is able to interact with the page. For our site, as chart-heavy as it is, that can occur before the charts have loaded. Users can orient themselves to the page, and start looking at summary metrics that are displayed outside the charts. Think about the humans using our site. Do they need the charts to start interacting with a page? No. Will the half-second it takes for the charts to load be noticed? Yes, but just barely. For the humans concerned, they feel as if the page has loaded when they can see the charts’ loading bars, so let’s use that perception to drive our tuning, and defer the chart loading. The body.onload should fire at the moment the user can start extracting meaning from the page. It’s up to you to decide what that means, and adjust your JavaScript code accordingly.

Performance matters!
Lastly, don’t forget that performance matters to your users. They don’t want to wait for your content: they want to engage with your content. Give ’em what they want, when they want it and you’ll have happy users.

Happy tuning!

Ready to optimize your site for real users? Sign up and deploy New Relic today. It’s free and we’ll send you a Data Nerd tshirt too!

Brent Miller is a principal engineer & architect for New Relic. He traded in his training as a botanist to become a frontend engineer, and has spent the past decade building UIs that are easy to manage and helping the engineers around him become better at what they do. View posts by .

Interested in writing for New Relic Blog? Send us a pitch!