Test All the Combinations!

By Posted in Engineering, New Relic News, Top Post 9 May 2013

Most applications test for their own correctness. But sometimes gems need to go further and test their interactions with other gems. newrelic_rpm – otherwise known as the New Relic Ruby agent – is definitely one of them. It needs to not only coexist with other gems, but to seamlessly instrument them.

So, how do we test code that’s heavily dependent on other components? How do we validate different combinations of frameworks that can occur in the wild? Read on for the nitty gritty of testing the Ruby agent in multiple environments.

The Uncertainties of Life (and Testing)
Before coming to New Relic, I was an app developer. At that time I never realized how much simpler life was when you could choose 1) what dependencies to take, 2) how the production environment was configured and 3) the fundamentals, like language versions. Unfortunately, library authors don’t get these certainties.

The richness of the gem ecosystem keeps the Ruby agent team on our toes. We support at least six different Ruby versions (depending on exactly how you count), eight versions of Rails, and a pile of other frameworks. If you ever need to be sure your code runs against Rails 2.0 and Ruby 1.8.6, I’ve got the test suite for you!

At the highest level, we follow a standard unit / functions (or integration) test breakdown. Let’s take a look at the unit tests first.

Unit Testing
Unit tests are the front line of our testing. They run quickly (about 10 seconds locally for the full suite of > 1000 tests), with the lightest dependencies. We execute them across the broadest number of combinations of Ruby and Rails versions.

Historically Rails has been a big focus for the agent, so the unit tests run in the context of a Rails application. While most of the agent’s unit tests could run without any external dependencies, we’ve encountered many subtle bugs from interesting intersections between Rails (I’m looking at you, ActiveSupport) and the various flavors of Ruby. Running the unit tests under Rails helps flush out these odd interactions. For example, we found one problem with subtly incompatible versions of the to_json method floating around during the 2.x versions of ActiveSupport.

Which version of Rails do we run against? Well, if you download the agent source and run the unit tests, you’ll see this:

♥♥♥~/source/newrelic/ruby_agent [1.9.3] [release]:rake test

/Users/jclark/.rbenv/versions/1.9.3-p374/bin/ruby -I"lib:/Users/jclark/source/newrelic/ruby_agent/test:/Users/jclark/source/newrelic/ruby_agent/lib" -I"/Users/jclark/.rbenv/versions/1.9.3-p374/lib/ruby/gems/1.9.1/gems/rake-10.0.3/lib" "/Users/jclark/.rbenv/versions/1.9.3-p374/lib/ruby/gems/1.9.1/gems/rake-10.0.3/lib/rake/rake_test_loader.rb" "/Users/jclark/source/newrelic/ruby_agent/test/new_relic/**/*_test.rb"

Running tests in standalone mode.
Run options:

# Running tests:

...more dots...

Finished tests in 8.206038s, 129.6606 tests/s, 338.4094 assertions/s.

1064 tests, 2777 assertions, 0 failures, 0 errors, 0 skips

The line ‘Running tests in standalone mode’ hints at what’s going on. In ‘standalone mode’ we construct a tiny Rails app in the agent that – by default – runs a Rails 3.2 app.

But what about all those other versions of Rails? That’s where rpm_test_app comes in. It’s a full, separate application with branches for each flavor of Rails we support. These get stitched together with some Rake goodness to run the unit tests for newrelic_rpm in rpm_test_app’s context:

Rake::TestTask.new(:newrelic) do |t|
  t.libs << "#{agent_home}/test"
  t.libs << "#{agent_home}/lib"
  t.pattern = "#{agent_home}/test/new_relic/**/*_test.rb"
  t.verbose = true

With `rake test:newrelic`, our CI system can run the gambit of Rails and Ruby versions. Jenkins configurations let us dictate which combinations make sense given that, for instance, you can’t run Rails 3.x and greater on Ruby 1.8.6.


Not All About Rails
That’s all good, but there’s much more to Ruby than just Rails. With the rich variety of frameworks to support, we needed an approach to mix and match dependencies.

This need led to our functional/integration testing layer, affectionately referred to as the ‘Pangalactic Multiverse’ tests.

At heart, Multiverse is a series of suites, each representing a collection of dependencies. We have one for Sinatra, one for Resque and one for ActiveRecord outside of Rails. The exact versions are specified by an Envfile:

suite_condition("Sinatra not compatible with 1.8.6") do
  RUBY_VERSION != '1.8.6'

gemfile <<-RB
  gem 'sinatra', '1.3.3'
  gem 'rack-test', :require => 'rack/test'

gemfile <<-RB
  gem 'sinatra', '1.2.8'
  gem 'rack-test', :require => 'rack/test'

Multiverse reads the Envfile and generates a Gemfile for each call to `gemfile`. It will bundle install those dependencies and run the suite’s tests in a separate child process.

Unlike the unit tests that focus on individual classes, Multiverse tests run a full agent cycle – starting up, monitoring work, shutting down and transmitting data. They even run a fake service endpoint so data flows over HTTP, keeping as close to the runtime reality of the agent as possible.

With the additional environment spin-up and running our full stack, Multiverse tests are slower, but vital for coverage. We can easily write targeted automated tests against multiple different versions of individual frameworks, and validate that the agent works with new releases as they come out.

The Envfile lets each suite run multiple sets of dependent versions. And our CI system takes care of running Multiverse against various Ruby versions, too. So many combinations!


The Hard Part
Testing always presents unique challenges. And a library like the Ruby agent introduces its own set of fun twists.

It’s About the Environment
A lot of the work on our testing system has gone into getting environment setup scripted, both locally and in CI. We’re currently using a solution with Vagrant and Puppet that lets us spin up consistently formatted build boxes.

Leaky Test State
Testing needs to mimic reality to find some types of problems. This presents difficulties for the Ruby agent.

In production, the agent starts once per process. However, in a test run, the agent might restart hundreds of times. We’ve had to take care to avoid a leaking state between tests. Otherwise we could be faced with tests that work fine in isolation, but break when run in the suite. However, an upshot of this pain is the test_bisect gem my coworker Jon wrote. If you’ve ever suffered from leaky tests you should check it out.

Timeliness Counts
It’s also hard to write good tests for monitoring time and the underlying system. It’s much easier to write brittle timing tests that run fine locally, but break when deployed to the slower VM-based CI system under heavy load. Given how much of the Ruby agent’s functionality is about accurate timing, this is a continuing pain point we wrestle with in testing. In general, we’ve moved toward stubbing time values in unit tests and applying thresholds or percentage comparisons rather than strict timeframes whenever possible in functional tests.

Need for Speed
Running Multiverse is relatively slow. So, we’ve worked hard to prune that time. Two major approaches were: a) keeping suites coarse grained and b) applying local bundling. Both of these hinged off the setup being the slowest part of Multiverse.

In the first case, our early experience saw new Multiverse suites cropping up per feature. Often these would just copy an existing environment with minimal tweaks. Merging those suites along the lines of their gem dependencies gave a huge performance bump. Getting the right granularity for grouping tests was a big win.

Second, the Multiverse setup tries to bundle without contacting RubyGems, but re-runs a full bundle if that doesn’t work. Less network chatter = faster setup = faster tests.

That’s how we deal with testing multiple frameworks and Ruby environments on the Ruby agent team. It’s given us a nice balance of having our fast unit tests against the most common cases, with easy access to full-stack Multiverse tests to cover other configurations.

Are you a gem author who tests against different frameworks and Ruby implementations? Got a trick (or two) that’s helped you keep your dependency testing under control? Tell us about your experiences in the comments below.

About the author

Tell us your thoughts Or Send us an internal high five

Talk to @newrelic