The team responsible for building the New Relic Infrastructure agent and on-host integrations has been hard at work leveraging Jenkins to speed up our release time, remove manual testing, and broaden our test harness.
When you run the Infrastructure agent on your Linux or Windows servers, it samples metrics and inventory data about your infrastructure, and you can use New Relic to view unified performance charts, create custom alerting scenarios, or gather metrics and data in customized dashboards in New Relic Insights. You can extend the capabilities of the agent by installing our on-host integrations, which allow you to retrieve metrics for services such as Nginx or Cassandra running on your servers.
Currently, we develop and maintain one agent, seven on-host integrations, and a SDK for developing integrations. In this post, I’ll share how we developed a full Jenkins pipeline for releasing new versions of of our on-host integrations.
First, the complicating factors
For our agent and integrations, we support multiple versions of five Linux distros: Ubuntu, Debian, CentOS, Red Hat Enterprise Linux, SUSE Linux Enterprise Server, and Amazon Linux. (We also ship an agent for Windows, but this post will just discuss our Linux coverage).
The diversity of systems we support add complexity at two key points: 1) package management—how the agent is installed, updated, and removed (APT, yum, and Zypper); and 2) service management—how the agent is started, stopped, and restarted (Upstart, systemd, and System V).
That’s a lot to support. The differences in package and service managers is one issue, but the bigger problem is that we can’t assume the agent and integrations will behave the same way across multiple versions of the same distros; there are often subtle differences between versions that could affect the way we interact with them.
For our on-host integrations, we support Redis, Apache web server, Nginx, MySQL, Cassandra, and StatsD (which I’m excluding from this post because it uses a different approach to installation and integration). As with the distros, we support multiple versions of each. Given this, we have to test the agent integrations against different versions of the service managers included in different versions of all the distros we officially support. Sounds exhausting, right?
This is where the Jenkins comes into play.
From packaging to integration tests
When we release a new build of an agent, Jenkins runs two separate jobs inside the pipeline: 1) it downloads the code, compiles it, and runs the unit tests; 2) it downloads the code, packages it, and uploads it to our internal yum, APT, and Zypper repos in the Artifactory repository, our dev repo.
This triggers our integrations tests, and Jenkins installs multiple versions of the service related to the integration. For example, to test our Nginx integration, Jenkins would install versions 1.10.3, 1.12.0, and 1.13.2 on each Linux operating system and version we support. Jenkins maintains a fleet of worker nodes with all the different operating systems installed. Jenkins uses these nodes to run the services and execute the integration tests against the services.
The integration tests consist of a program that executes the integration binary and checks its output. For every integration, we’ve defined a JSON schema for the output, and the tests check that the output contains all the expected attributes and that the types of the values are correct.
Here’s a typical Jenkins job matrix for our Redis integration tests:
From publishing to integration
Assuming all the integration tests pass, we publish the new package to our production repos, which we host in an Amazon Web Services (AWS) S3 bucket. For this step, we download the package from our Artifactory repo, add it to the corresponding package manager production repo, and update each package’s metadata.
When we publish a package to the production repo, Jenkins kicks off the final test suite: installation testing. Here, Jenkins uses its worker nodes to install the packages from staging and executes three tests to check that the upgrade process is successful. This final installation test is triggered when we kick off a publish job, and it also runs every hour as a repo health check.
In the end, the whole thing looks like this:
Seeing the payoff
Building this pipeline took a great team effort, but we’re already starting to reap the benefits. Recently, we managed to release six new beta integrations in a single week, with four more coming soon. The installation testing alone has been a tremendous boon: It has already helped us to discover a bug caused by the differences in how Zypper, APT, and yum manage dependencies.
We still have plenty of room for improvement, but we have already greatly reduced the challenges of adding new integrations and of supporting new operating systems. Even as the number of new integrations increases, the toil steam has significantly decreased.