This is the first in a series of three guest posts demonstrating the basic principles of performance tuning from Flood IO co-founder and CTO Tim Koopmans. Part two addresses slow transaction and database traces, while part three covers finding additional functional errors.
Performance tuning involves using the scientific method to correct and integrate previous knowledge via an iterative process. Each iteration starts with a defined question and an explanatory hypothesis that can be tested in a reproducible manner. The process is essential to helping us measure, evaluate, and improve system performance. .
To demonstrate how application performance tuning works in real life, we used Flood IO — a distributed load testing platform that lets you scale out your JMeter or Gatling Stress Tool load test scenarios across the globe within minutes — to generate the load and New Relic Pro to help profile and analyze the application whilst under load, including detailed transaction traces and code level visibility.
For our Application Under Test, we chose an intentionally poor-performing Ruby on Rails app from the New Relic Code Kata, hosted on Heroku with free New Relic monitoring. We used the Ruby-JMeter gem to write a test plan and execute it for free on Flood IO. New Relic is free for Flood IO users and free for monitoring pre-production apps on Heroku.
The first step was to measure the baseline performance of our test app using Flood IO, which served as a reference point for subsequent performance tuning. (Note, we’ve included screenshots from various tools so you can follow the progress we made—the FloodIO transactions correspond to New Relic transactions but may have different names.)
Ouch! With just 3 concurrent users,
many_assets had a mean response time — a primary metric of performance analysis — of 56s with a standard deviation of 14s.
Why was this so slow? New Relic
Transactions showed us the slowest average response times:
(Note that the timing for
many_assets is different in New Relic and Flood IO—the point is that the total page load time is vastly greater than the back end time to generate the page.)
We determined from the routes that our point of interest (POI) – the suspect code we’re looking at — is the
ManyAssetsController, which is linked to our
many_assets transaction. But New Relic indicated that our POI was not the slowest transaction, so we had to consider another important metric of performance analysis, throughput.
Unlike response time, throughput is a quantity described as a rate. New Relic
Transactions showed us the highest throughput:
many_assets transaction in Flood IO had a mean completion rate of 3 requests per minute (rpm) but the
ManyAssetsController in New Relic showed 1.72rpm for the linked index method and another 688rpm for the display method. Why was this so high?
The page view itself gave us some hints: Too many images!
For a more detailed explanation, we took look at the same transaction using YSlow:
Ouch again! This page makes 400 image requests on an empty cache. Worse, those same 400 images were requested on a primed cache. This would be particularly bad for high-latency links.
Using observations from Flood IO and New Relic, we formed an explanatory hypothesis for why response times are slow for this transaction.
To solve the problem, we experimented with using CSS image sprites to present this collection of images as a single request. This should reduce the number of round trips to the server, which in turn should decrease throughput on the controller and – hopefully — improve response time for that transaction.
We added CSS image sprites and repeated the baseline. Our prediction turned out to be sound. After the changes,
many_assets was no longer in the top 5 slowest transactions in Flood IO. New Relic also helped confirm that we solved the throughput issue: No more evidence of +600rpm transactions.
This post introduced the basic concepts of performance tuning and how you can simulate load using Flood IO and analyze performance using New Relic. The second post in the series reveals how we tuned the remaining performance problems present in the test.