As part of our “Life at New Relic” blog series, we sat down with Ron Crocker, a New Relic distinguished engineer and architect. At New Relic, we use engineering standards to help us build more perfect software faster. Ron explains how and why New Relic formalized its institutional knowledge, best practices, and experience as “New Relic Engineering Standards.”
New Relic: Why are engineering standards important, both in general and for New Relic?
Ron Crocker: As companies grow, it’s important to codify the institutional knowledge so that everyone can continue to benefit from it. When the organization is small, it’s much easier to share the story of how the company believes software should be built. But once a company reaches a certain size, writing down the story is a better way of sharing it.
As New Relic grew, we evolved our organizational structure and needed a way to communicate our way of doing things across many different teams in many different parts of the world. We found that teams were making local decisions that didn’t necessarily leverage the investments we had already made elsewhere or take advantage of the lessons other teams had already learned.
So codifying the things that we do every day was a way to help us communicate to teams and new staff about how we do things at New Relic to make sure we’re delivering long-term reliability, scalability, consistency, and quality of our systems.
New Relic: What kinds of things are included in the standards? Can you give us an example of one?
Ron: Our standards are largely focused on using the languages, frameworks, libraries, platforms, tools, and techniques in which New Relic has invested and has used successfully in our production system. We’ve focused on how things have worked for us in production—both things we want teams to do more of as well as things we want teams to do less of.
Here’s an example: One of the standards we documented says if you’re interacting with Kafka, you must use our Java client (and the JVM). We have had incidents that we could tie back to not running on the JVM when teams interacted with Kafka. That led us to conclude that the easiest way to solve that problem was just to avoid it in the first place. It also enables us to leverage any investment in this client across our many teams and services.
New Relic: It sounds like having written engineering standards means that teams don’t have to reinvent the wheel every time. Does that speed up the development process?
Ron: Velocity is one of the biggest benefits we’re hoping to see from implementing the standards. We anticipate that following the standards will make us faster at delivering customer value.
The point of having engineering standards is to help teams avoid the accidental complexity of software development so they can focus on the essential complexity, as described in Fred Brooks’ paper on software engineering.
New Relic: Did you follow a specific model for software engineering standards?
Ron: While there are numerous software industry standards, from ISO to IEEE and others, with various organizational schemes, we liked how ThoughtWorks grouped its Technology Radar guide for tools into recommendation rings. So we started by grouping our standards by ring:
- Adopt (Yes, you SHOULD do this, no approval needed)
- Trial (Yes, you MAY do this with an architect’s approval)
- Assess (No, while you MAY do this, we don’t recommend it)
- Hold (No, you SHOULD NOT do this)
We liked the idea that the action the reader should take be clear and obvious, so we introduced two new rings:
- Require (Yes, you MUST do this)
- Deprecated (No, you MUST NOT do this)
These reflect stronger guidance than Adopt and Hold, respectively.
Aside from the groupings, it was important for us to include why we made something a standard. We want people to understand the rationale behind the choices that we make.
Finally, we chose to make our standards process open. Each standard is a separate markdown file in the standards repository in our GitHub Enterprise system. Markdown is very accessible, and having them in GitHub allows us to have an open process.
We then process that repo to leverage GitHub Pages enabling our custom New Relic One application to present the data in a way that’s easy for people to navigate and search.
Changes to the standards are introduced through pull requests against that repository. Indeed, the first substantive change to the standards after they were published came from a team that wanted to incorporate what they do.
New Relic: You rolled out the new standards at the beginning of 2020. How long did it take to create them, and how did you introduce them?
Ron: It took us about six weeks to craft and agree on the language for the standards. Then we previewed them with the teams before we released them. It was important that everyone understood that while we were taking away some of their freedom, we did so to free them to focus on the problems we need to solve to deliver value to our customers.
The standards are owned and maintained by the architecture team. They are the technical leaders the company relies on to make difficult technical decisions that balance the long-term health of our systems with business objectives.
In terms of rolling out the standards, we decided early on that we weren’t going to tell teams to rewrite something unless there was a strong reason, such as a critical bug that needed to be fixed. The idea was to impact projects going forward. This removed a lot of the teams’ fear and allowed them to see how the standards would improve their own decision-making process.
New Relic: What benefits have you seen so far?
Ron: There are a few. First, merely having the standards written down means that they are accessible to people in Barcelona or Portland or wherever people happen to be. It’s been a great equalizer across teams and organizations. This became even more valuable in light of the pandemic and its influence on how we work; having the standards written down, already in place, made it much easier to communicate among the teams.
Another benefit is that it’s become a great tool for our architects to help their teams learn to make better decisions. That’s because they learn from the standards and understand why we decided to handle something the way we did.
We also get improved leverage of the investment that we’ve made in tooling and libraries.
Finally, we anticipate we’ll see improved reliability as a longer-trailing indicator. This will happen over a longer period as older parts of systems get brought into line with the standards over time.
New Relic: Can you share some lessons learned or words of wisdom for other software engineering organizations that want to create engineering standards?
Ron: Absolutely! While there were many lessons learned over the past year, I would say that the most important learnings and best practices are the following:
- Write down reality: Don’t make the standards aspirational at the beginning. Instead, make them very practical and attainable to encourage everyone to understand and follow them.
- Be judicious in demands on teams: My advice to the team when we started this effort was to limit the number of standards that we put in the Require and Deprecate rings to a small handful of important things we needed teams to do. It’s easy to overwhelm teams with unfunded mandates. I’d put our conscious choice to make them primarily forward-looking in this category as well. It is unlikely for teams to have sufficient space in their roadmaps for rewrites, and asking teams to rewrite things merely to adopt the standards is a recipe for disappointment.
- Give non-architects a voice: Originally, we focused on working within the architecture team but realized that non-architect principal engineers needed to be involved as well; they bring a different perspective to the table. Our open process also helps to bring in other voices.
- Communicate widely and often: Everyone needs to hear why standards are needed and reduce the FUD around them. Communicating across the organization about what we were doing helped the architects sell it to their teams when it was time to roll it out.
- Get an executive sponsor: Like any change management process, you need a strong executive champion behind the effort to help get everyone aligned on why it’s important for the company to do this.
New Relic: What’s up next for the engineering standards?
Ron: We endeavor to publish quarterly updates to the engineering standards so that they continue to evolve as New Relic continues to innovate. We’re currently on our third release of the standards since we introduced them.
The next thing we’re doing is building a scorecard of how well our teams are doing with respect to the standards. This scorecard will also reflect on the standards themselves, giving the architecture team some direction for where to take the standards. While that requires a bit of value judgment around the importance of certain standards, we will be focusing on the ones that impact either velocity or incidents. The scorecard will then show us whether there is a correlation between a higher degree of standards compliance and fewer incidents, for example, or higher velocity.
Further down the road, we’ll be looking at open-sourcing our standards tooling and maybe even the standards themselves.