Testing your SLA with Loader and New Relic Insights

LorinHochgstein

Guest author Lorin Hochstein is a lead software engineer for SendGrid Developer Services, the company behind Loader.io, which provides simple cloud-based load testing. This post is adapted from a post on the SendGrid DS blog.

So, you’ve decided that you’d like to do some load testing on your website. What’s an acceptable response time?

I like to think of response time as a service-level agreement (SLA): the level of responsiveness that you are going to commit to offer to your users. According to usability research, if an interactive site takes longer than 1 second to load, it will interrupt the user’s flow of thought, and if it takes longer than 10 seconds it will lose the user’s attention.

We also need to think about our SLA in terms of statistics, since we are dealing with multiple responses. We don’t want to just use the mean, because outliers can be quite common. Your site may respond to most requests in under 300 ms, but once in a while it may take 1.5 seconds to respond.

Percentiles are a useful statistical tool to avoid the problems of outliers. For example, we might say that we want at least half of the requests to respond in under 400 ms, at least 80% of the requests to respond in less than 500 ms, and at least 95% of the requests to respond in under 1 second.

Let’s see how we can use Loader along with New Relic Insights to check if we are hitting our SLA:.

Configuring Loader to push data to New Relic Insights

You’ll need to enable New Relic Insights integration in the integrations menu and then specify your New Relic Insights credentials:

Testing SLA 1

Testing SLA 2

Defining a test

For this blog post, I used a load test of 1000 clients over one minute against a machine in one of our test environments.

Testing SLA 3

 Loader data available in New Relic Insights

The simplest way to see what data is available in New Relic Insights is to run a test, and then do a NRQL query like this:

SELECT * from loaderio

Testing SLA 4

For this blog post, we’ll focus on avg_time (the average response time for a data interval) and clients (the number of test clients active during a time interval).

You can find more detailed info about these fields on the Loader New Relic Insights Integration documentation page.

Setting up a New Relic Insights dashboard

On my dashboard, I like to see the maximum number of active clients during the test interval, as well as a plot of the number of clients over time. This info lets me know that the test is actually doing what I think it is.

I like how the New Relic Insights dashboard lets me peg the views to certain time intervals. When I’m watching the test live, I do:

SELECT * from loaderio SINCE 1 minute ago

After I’ve run my tests and I want to review results, I specify a specific time interval based on when the test took place:

SELECT * from loaderio SINCE '2014-10-06 19:53:00' UNTIL '2014-10-06 19:54:30'

These screenshots show views pegged to a specific time interval, but in my NRQL examples I’ll assume you’re viewing live data.

SELECT max(clients) from loaderio SINCE 1 minute ago
SELECT average(clients) FROM loaderio TIMESERIES SINCE 1 minute ago

Testing SLA 5

Next, I put up some displays that show the SLA metrics we’re tracking. I’m going to use the same numbers as the example I mentioned earlier, at least 50% of the requests should respond in under 400 ms, at least 80% of the requests should respond in under 500 ms, and at least 95% of the requests should respond in under 1000 ms. I used the critical threshold feature so that the numbers turn red if they exceed these thresholds.

SELECT percentile(avg_time, 50) from loaderio SINCE 1 minute ago
SELECT percentile(avg_time, 80) from loaderio SINCE 1 minute ago
SELECT percentile(avg_time, 95) from loaderio SINCE 1 minute ago

Testing SLA 6

We can see that we’re hitting our 50th percentile and 80th percentile metrics, but not our 95th percentile, which suggests that we have some outliers.

I also like to visualize how these SLA metrics vary over time, as well as check the distribution of response times across the entire test period:

SELECT average(avg_time), percentile(avg_time, 50, 80, 95) from loaderio SINCE 1 minute ago TIMESERIES AUTO

Testing SLA 7

Finally, here’s what the whole dashboard looks like:

Testing SLA 8

New Relic Insights Integration is available to all Loader users, including those on the Free Loader.io plan.

This post was written by a guest contributor. Please see their biographical details at the top of the post above. View posts by .

Interested in writing for New Relic Blog? Send us a pitch!