Performance Tuning with Flood IO and New Relic (Part 3): Finding Functional Errors

This is the third in a series of three guest posts demonstrating the basic principles of performance tuning from Flood IO co-founder and CTO Tim Koopmans.

In the first post of this series, we introduced the basic concepts of performance tuning and demonstrated how you can simulate load using Flood IO and analyze performance using New Relic. The second post used slow transaction and database traces to help identify and tune obvious problems in our application under test.

In this post we use New Relic to fix the remaining problems and confirm all are fixed in Flood IO.

Custom instrumentation

New Relic includes an API you can use to collect additional metrics about your application. If you see large “Application Code” segments in transaction trace details, custom metrics can give you a more complete picture of what is going on in your application.

In our test application, Flood IO was still showing problems around caching. Looking at the source code revealed an existing tracer method in the CachingController.

This let us create custom dashboards within New Relic to present this data. It’s evident that no matter how many times this method is called, the minimum response time is always +30ms.

Looking at the code we can see this method is trying to make use of the Rails.cache however closer inspection identifies differences in the key name being read, and key name being written. Therefore the cache is never read from.

We quickly deployed a fix and confirmed success manually with a single browser session.

External services

The tweets transaction is also slow and further investigation showed the majority of time was spent in a call to an External service: Net::HTTP[twitter.com]: GET

flood.io3_image1

Outbound calls to Twitter from the TweetsController are going to be expensive under concurrent load.

flood.io3_image2

By simply caching at the page level, we can get away with not having to execute the controller code for every request, thereby limiting the amount of outbound calls made to an external service.

Errors

Last but not least, we wanted to track down error events. New Relic’s event monitor makes this easy. We can get an idea of the error rate and when they are occurring under load:

flood.io3_image3

We can also get a breakdown of the types of errors that occurred:

flood.io3_image4

The stack trace pinpointed exactly where in our application code things were going wrong:

flood.io3_image5

These are simply, innocuous functional errors, but the cost of serializing stack traces and handling those errors in a production environment can still be high. So it makes sense to resolve the division by zero error being reported.

Confirmation

The final part of a performance tuning test effort is to confirm that all the iterative changes hang together.

Our last baseline showed much better response time averages across the board, and easily satisfies the 4s target. We also eliminated any errors under load.

flood.io3_image6

Once we’ve whipped the application under test into shape, it’s time to start load testing. We chose an arbitrary concurrency of 1,000 users with a response time target of less than 4s. We scaled out with 6x Heroku dynos and 3x grid nodes in Flood IO across the East and West coasts of the U.S. as well as Australia. We also contributed a Flood IO plugin to the New Relic Platform (see more information on the integration in this Flood IO blog post).

flood.io3_image7

flood.io3_image8

Flood IO and New Relic clearly make a powerful performance-tuning team. The great thing about the combination is that they put all the information in one place.

Of course, you can always keep performance tuning. Now that we’ve ‘fixed’ the initial round of performance defects, it will be easier to identify any new problems under sustained load. For example, it looks like request queuing is happening on the Heroku dynos, but that’s a story for another day…

Tim Koopmans is introducing more and more people to the art of performance testing and debunking myths along the way. If you’re already monitoring your application’s performance Tim will show you practical ways to correlate the myriad of metrics available into meaningful information. Tim is also author of a popular Ruby based DSL for JMeter and CTO of flood.io. View posts by .

Interested in writing for New Relic Blog? Send us a pitch!