Here at New Relic, DevOps is an integral part of how we operate—different teams collaborating to help foster more frequent deploys, faster issue resolution, and more stability and innovation across the organization. But while we’ve written a lot about DevOps over the years, one thing we haven’t talked much about is how we actually practice DevOps here at New Relic.
In this post I’d like to share how we approach DevOps internally at New Relic, including the key practices and technology tools we use as well as the benefits we’ve seen from our efforts so far.
New Relic has been following a DevOps methodology for some time now. It originated with Nic Benders (currently our Chief Architect and VP of Engineering) when he established the site reliability engineering team back in 2012. Some early members of that team became internal DevOps thought leaders, and before long many of our site reliability engineers (SREs) were eager to try out a more organized DevOps approach.
DevOps initiatives don’t always originate from the ground up like this. In many organizations, DevOps is imposed by management, often simply because it has become a trendy buzzword: “I’ve heard about this thing called ‘DevOps’ and we’re going to try it. Let’s start doing DevOps.” At New Relic, though, it was the engineers themselves who became aware of the concept and started to try it out, and from there it developed organically without management standing in the way (always a sign of smart management).
Still, the move toward DevOps initially required a big cultural shift—a tearing down of the walls that traditionally separate development and operations teams so that both groups were working together toward the shared goal of customer happiness. But, of course, DevOps is more than just a new mindset; we needed to make sure we had the right processes and tools in place.
Key DevOps practices
One of the first things we tried was what we called “Forward SREs,” which are SREs assigned to different teams on a sort of “consultant” basis to help keep things running smoothly. Our Forward SREs were responsible for different product teams—often several at once—but they weren’t necessarily part of those teams.
Today, the Metrics Pipeline team has taken this a step further with a role we’re calling a “Product Reliability Engineer” or PRE, a type of SRE whose focus is on a particular product. The development of this new role was an attempt to embed a SRE directly into a specific team. These PREs no longer communicate with developers through a ticketing system, which in my experience invariably results in bottlenecks and friction; instead they join the team itself and become part of it. The team’s goals become their goals, and vice versa.
Beyond creating specific DevOps-friendly roles, we also paid attention to how our teams were structured. We experimented with a lot of different organizational structures over the years, and if something wasn’t working, we weren’t afraid to scrap it and try a new approach. In fact, our engineering teams recently participated in a huge, innovative reorganization initiative we called “Project Upscale,” in which everyone was given the opportunity to choose the teams they wanted to work on. If DevOps is all about knocking down barriers between teams, then Project Upscale was the ultimate DevOps move.
Key DevOps technologies
Now I want to talk about some of the technology tools that make our DevOps initiatives run smoothly: Cassandra and Docker.
Cassandra is a distributed datastore that allows New Relic to scale. Traditionally we’d store metric data in a relational database like MySQL, which would be owned by a separate team (uh oh! barriers between teams). The beauty of Cassandra is that it’s a database that lets itself be treated as a service, and it’s written in Java, so developers can look into the guts of the code to see what it’s doing. Because of this, everyone on the team—both developers and operations—know how to operate Cassandra, how to deploy it, how to stop it when it’s acting up, and how to replace it.
Docker is essential for us because it helps developers do operations work. For example, let’s say we have a tech stack that includes Ruby apps, Java apps, and so on. If you Docker-ize all those apps, you can use the same commands for everything. If you didn’t use Docker, you’d need to have a Ruby script for your Ruby apps, a Java script for your Java apps, and a Puppet script for your hardware and the databases. Docker can make all this much easier, offering a kind of API to operate on any technology using the same commands. In fact, New Relic has been pretty cutting edge in our use of Docker. (We also run Cassandra inside a Docker container to make it even more DevOps-friendly.)
Using our own tools
Like our customers, we rely on some of our own tools to do DevOps well. Since DevOps is all about breaking down walls and allowing dev and ops to work as one, it’s important to find a common language they can speak and a common toolset they can use. New Relic Insights is ideal in this regard as it allows data from the “development world” and the “operations world” to be combined and displayed in one dashboard. When everybody has one unified place to look, and the data displayed there is shown in clear, easy-to-understand charts, then all members of the team are operating from the same place. This can be very powerful. And it’s not just the technical teams who have easy access to this information; every member of the organization does.
Two other New Relic tools that are helpful for any DevOps initiative are New Relic APM (more easy-to-understand charts and metrics!) and New Relic Alerts—especially our recently unveiled New Relic Dynamic Baseline Alerts, which can help people across diverse teams monitor the status of any specific application.
(For more information on this topic, be sure to read 6 Ways New Relic Can Help You Do DevOps Better.)
Benefits of DevOps
The benefits we get from our DevOps approach to software development include the removal of bottlenecks and hard stops that can frustrate development teams. We use a simple system to measure whether or not different teams are blocked, with red, yellow, and green status indicators. My team, fortunately, has never been red, which means we’ve never been blocked by another team. In fact, we’re almost always green.
Additionally, our SREs help instill a strong written culture by documenting everything, creating runbooks for all operations, doing writeups of incidents, and so on. This helps reduce the bus factor on each team, because all that knowledge is documented and accessible to everyone, not just stuck in the head of a single developer.
A less-obvious benefit to a DevOps culture relates to employee retention—as a group, engineers tend to prefer a startup culture to that of a huge corporate enterprise. Operating under a DevOps models gives our engineers the feeling of working in a small startup while still reaping the benefits of working for a larger company.
Want to learn more about DevOps? Visit our DevOps Hub and be sure to check out these additional DevOps resources:
- Dashboards for DevOps: Examples of What to Measure [blog post]
- 6 Ways New Relic Can Help You Do DevOps Better [article]
- DevOps Without Measurement Is a Fail [ebook]
- New Relic Helps Global Enterprises Accelerate DevOps Success Through Measurement Across the Application Stack [press release]
Also, be sure to sign up for our free webinar, Lessons Learned from New Relic’s DevOps Journey, on February 15, 2017, at 11:00 a.m. PT, presented by New Relic Senior Site Reliability Engineer Jonathan Owens. Register for webinar now >
Note: Event dates, participants, and topics are subject to change without notice.