Pinpoint Bottlenecks in Real Time with New Relic and BlazeMeter’s New FollowMe

This guest post comes from Itay Mendel, who runs DevOps for BlazeMeter, the load-testing cloud.

Performance testing has come a long way.

In its infancy, methods were a tad primitive. If you wanted to know where your test would break, you’d give all your colleagues a stopwatch and a set of instructions. They’d simultaneously browse the application and log the response times.

Sure, you could find bottlenecks that way, but adapting the tests to pinpoint where and how they were caused was both time-consuming and likely to make you very unpopular.

Pinpointing bottlenecks by writing (and rewriting!) scripts

Thank goodness times have changed. The following scenario is probably more familiar to you:

  1. Set up a scenario using a scripting tool such as JMeter
  2. Run the test
  3. Hit a bottleneck
  4. Go back to the scenario, rescript, and reconfigure
  5. Run the test again
  6. Hit another bottleneck
  7. Try not to scream and do it all over again
  8. And again … and again … and again …

You can view problems, you can even see why they are happening. But it still takes  many iterations to pin down the precise point at which bottlenecks occur and resolve the issue.

How to pinpoint issues and adapt your test on the fly

Fortunately, there is a way to adapt your test without rewriting the script or reconfiguring the scenario. In fact you don’t even need to write the script in the first place.

You test automatically as you surf the Web. And as soon as you hit a problem, you can adapt your test scenario online on the fly and carry on testing.

It’s made possible by a new BlazeMeter feature called FollowMe, which allows you to test automatically by instantly triggering a load of up to 100,000 virtual users to follow you as you browse the Web.

Running dynamic load tests: A step-by-step guide

To show you how to do this, I ran some load tests of my own. I wanted to see where the bottlenecks occurred and what was causing them so I set up two different screens. One had my Web application open. The other displayed both New Relic and BlazeMeter dashboards, enabling me to see what my ‘followers’ were doing and what was going on with my app through real-time performance reports.

Step 1: A simple load test

I started off by running a simple load test on 400 concurrent users. Using BlazeMeter’s Chrome Extension, I was able to configure FollowMe to set the test for 400 users, specify the origin location, user agent, and domain that I wanted to follow. Once I’d made the specifications, the test was ready to run. Then all I had to do was surf the Web.

blazemeter6

Step 2: Check response times

I decided to use BlazeMeter’s staging environment for my test. I clicked through ten pages with this virtual network of 400 users ‘following’ me with no problems. However, as I clicked on one of the blog articles, I saw that response times had gone through the roof–with an average page load of around 15 seconds. Clearly there was a bottleneck hiding there somewhere. In fact, the response time was so high that I assumed we’d hit more than one issue along the way.

Blazemeter 2

Step 3: Begin pinpointing the bottleneck

In order to pinpoint the primary bottleneck causing this issue and the precise point that we hit it, I lowered the number of users down to 340.

If I’d run the test the ‘traditional’ way, I’d have gone back to the drawing board, reconfigured the users, adapted the script, and started the test all over again.

This time, I just went back to my FollowMe button, lowered the concurrency of users down to 340, and clicked straight back on to the problematic page.

blazemeter_followme_screenshot

Step 4: Keep testing

Response rates were still too high, so I altered the test once more–this time down to 300 users. Again, this took me just a couple of seconds.

Step 5: Identify the bottleneck

With 300 users, it was fairly easy to deduct where the first bottleneck occurred.

Blazemeter 3

So I checked my New Relic dashboard to see what was causing this problem.

Blazemeter 4

Clearly the issue was occurring in the PHP–but I wanted to drill down further for more precise data. By doing this, I was able to see that the main bottleneck was caused by an I/O issue in the file_scan_directory component.

Blazemeter 5

Step 6: Fix the problem

That was it! I now had all the information I needed to go back to the development environment and fix the problem.

Because I was able to make instant changes to the test on the fly, the whole process of running three different load tests took me less than 20 minutes.

You can find out more about running instant script-free tests at http://www.blazemeter.com/FollowMe

Author bio: Itay Mendel joined BlazeMeter as the quintessential DevOps guy, intertwining the development and operations lifecycle with a focus on the infrastructures of systems and projects. Prior to joining BlazeMeter, Itay was a Sysadmin and DevOps consultant at DevOps Israel, focusing on the development of agile and DevOps methodologies. Itay also spent several years as a programmer and integrator at Bank Hapoalim for CyberArk products designing and developing systems for secure server management.

This post was written by a guest contributor. Please see their biographical details at the top of the post above. View posts by .

Interested in writing for New Relic Blog? Send us a pitch!