“We Don’t Have Time to Fix Things” – Tips from New Relic

We Don't Have Time to Fix ThingsIn software, there’s often a war between getting stuff done and making the code perfect. Generally, our profession leans towards code quality, but we often blame the business for pushing us towards only getting features done. After all, that’s ultimately what customers care about – they don’t care how pretty our code is. So many of us complain and fret about the state of our codebase. And complain we don’t have the time to fix everything that needs fixing.

The bad news is you’re never going to get that time. Most businesses don’t have any pressure on them to focus on the state of their code unless it’s gotten bad enough that customers are really complaining about it. That means the focus is around shipping features and improvements.

But the truth is both are important. You need to keep making forward progress. And you also need to keep the area in which you work maintainable, so you can continue to make forward progress in the future.

I’d like to share a couple of tricks we use at New Relic to protect our long-term productivity, while still keeping our main focus on what our customers care about.

Measure How Much You Bake Quality In
When we ship a feature to customers, we typically spend a little time afterwards cleaning it up. To some extent that cleanup reflect necessary work, like removing feature flags. But it can also represent technical debt that becomes obvious after the feature is in the wild. For example, last year we released a feature and noticed a lot of bugs once it went live. We found it extremely difficult to fix the bugs, because the code wasn’t tested or even testable, and it wasn’t easy to make the changes to fix the bugs. This was a failure on a couple of levels, but we didn’t realize quite how bad it was until we had difficulty fixing it.

I make it a habit to notice how much time is necessary for cleanup after each project and keep a note of how we’re doing. For me it’s a measurement of the quality we’re baking into the project as we go along. Another way to measure this is to see how much time it takes for the team to focus completely on the next project after we move on to it.

The ideal state, and one that I think more experienced developers reach, is where your whole development methodology is efficient enough that you code quickly, but bake your quality into your work as you go. You write your tests as you go along and you work rapidly because it’s the natural way for you to write code. I try to evaluate how close I am to that ideal, as a way of measuring my own progress as an engineer. And I try to evaluate how our team is doing based on the same criteria.

Create a Wall Between Expanding and Sustaining Work
Another thing we do is to classify all our work as either expanding work (getting stuff done our customers care about) or sustaining work (making sure what we’re doing is sustainable long-term, such as fixing bugs and improving our processes.) Each team is charged with spending time in both, and the manager of each team reports on the percentage of each, every two weeks. This forces us to be accountable and puts some pressure on us to make sure we’re spending enough time improving how we do things, so we can sustain our rapid velocity as we focus on project work.

Some examples of things that have come out of the sustaining time:

* One of our engineers cut our test suite from 21 minutes to four minutes.

* We upgraded Rails, Ruby and libraries like jQuery.

* We made improvements to a class of problems call metric grouping problems, which have been one of the most time consuming and expensive problems for engineering. It’s also been one of the most frustrating experiences for our customers.

* We upgraded our time picker.

The important idea here is that you budget time for sustaining work, and you build a strong protection against stealing from that time. Our own experience has been that about a 50/50 split is about right. It’s a challenging thing to do, but you have to make a strong commitment to preserve team time for sustaining work.

Ebb Week
A practice we recently revived on the Core Team is ‘ebb week.’ After you finish a project and all work associated with it, you get a week to focus on the cleanup and improvements that you think are most important (outside of the project you were just working on.) For example, you can fix a longstanding issue that frustrates a lot of our customers or you can rewrite part of the code that’s tripping everyone up.

How does it work? At the beginning of the week, you tell your manager and the team what you plan to work on. Then you spend a week working on it and tell the team what you did.

Work Incrementally
Any codebase that’s large enough has years of changes in it. And the way you and your team write software changes over time. This means that are going to be layers of different ways of doing things and inconsistencies along the way.

There’s often the temptation to make everything consistent. If you’re thinking about it, make sure you weigh the benefits and the costs. In a large codebase, you could spend all your time making things consistent and provide no value to the business. We always work within constraints, so we’ll never have the time to make everything perfect. In fact, making it perfect is probably a bad idea as it steals time away from important things that your customers actually care about.

The pattern I’ve found best is to make any improvements all at once, if it’s easy to do so. And sometimes it really is worth spending the time and removing the ugly parts of the code. This way, people don’t ever have to think about it again.

However, it doesn’t often make sense to rewrite code that’s working perfectly well, is never touch and/or doesn’t really have any burning need to change. An example of this on our codebase is some of our JavaScript code. These libraries work very well and we never touch them, so there isn’t any need to change them now. In these kinds of situations, it’s helpful to get everyone to agree on how to do this in the future and, if possible, mark the code that’s perfectly fine but different than the way it’s being done now. This way, no one will copy that style in their future code.

When should you invest the time in cleaning up bad code? The criteria I personally use is whether people are afraid to touch the code. A small mess isn’t a big deal, just remember to improve it as you go along. But fix any code that breaks every time someone makes a change to it or is so brittle that people actively avoid touch it. A key factor here is often whether the code is tested and is testable.

Build the Culture of Baking it in
One of our goals is to make sure our code is continually being written at a higher level of quality than it was in the past, while also improving the velocity at which we can deliver the awesome features our customers want to see.

A few things we do to bake the quality in are:

* Strongly encourage TDD
* Rely heavily on continuous integration
* Demo increments of work to someone else on the team before you merge to master
* Reject pull requests unless there are tests
* Encourage pair programming

But more than anything, we talk about it, a lot. We talk about the balance between shipping new projects and fixing longstanding problems. We talk about it if we’re leaning too much in either direction. And we all try to bake the quality into our work and to help each other get better at it, every day.

Work With Us!
One of the best things about our team is that this article will be outdated the minute it is published. We are continually experimenting with ways to improve our focus time, reduce meetings, improve communication, and ship our code faster to our customers. If you’d love to be a part of our team, please look at newrelic.com/jobs.


Software Engineer View posts by .

Interested in writing for New Relic Blog? Send us a pitch!