As our recently announced research illustrates, organizations that prioritize software excellence also report higher revenue growth and say they are ahead of their peers on brand perception, pace of innovation, employee engagement, and more. We spoke with Gene Kim, DevOps researcher and WSJ bestselling author, to solicit his thoughts on what organizations can do to develop more perfect software—and why some are performing dramatically better than others.
New Relic: How can organizations improve their approach to software development?
Gene Kim: I love Jon Smart’s definition of DevOps: better value, sooner, safer, and happier. It means that organizations can innovate more quickly and rapidly deliver value to their customers while preserving reliability, stability, and security.
In my view, organizations that master this new mode of production will be the ones that survive and win. And it’s not just commercial organizations—it’s also government agencies, military services, and so forth.
New Relic: What’s stopping organizations from developing more perfect software?
Gene: In my book, The Unicorn Project, I pinpoint a couple of areas that are very important for improving software development.
One is software development teams’ ability to work independently. Can organizations achieve the Amazon ideal of the “two pizza team,” where each team can independently develop, test, and deploy value to customers? Or do they have 40 different, interconnected teams that must always be communicating, coordinating, prioritizing, synchronizing, and scheduling together? If it’s the latter, then nothing can get done.
The second area is focus, flow, and joy. These states are easier to achieve in small, independently operating teams that provide locality and simplicity. This frees developers to use their best energies to solve business problems—as opposed to getting mired in things such as process, approval, and architecture review board meetings. The state of flow they achieve by focusing on a single issue means they are much more productive.
New Relic: To what extent do legacy systems and architecture make it difficult to develop more perfect software?
Gene: When you have a lot of legacy systems that are built up over decades of technical debt, it makes it impossible to get things done. There’s one way to solve that: by improving daily work.
And I love the philosophy that improvement of daily work is more important than daily work itself. If you’re spending all your time developing features, the inevitable consequence is that technical debt will make even easy things impossible. Without maintenance and renewal, everything turns into legacy.
New Relic: A key theme we’ve explored in the research is observability—organizations’ real-time view of how all their software and systems are performing on a single platform. How does observability help to improve organizations’ approach to software development?
Gene: Whether you are operating on a patient, launching a rocket into space, or developing software, you need to be able to see what you are doing—that means observability.
Telemetry enables you to link cause and effect. So, using the operating theater metaphor, it tells you whether a particular action causes a patient’s heart to beat or fail. In the context of software development, if you don’t have this type of fast feedback, then you can’t experiment, because you don’t know what’s working and what’s not working.
“Observability has to go beyond telemetry and logs. Organizations need a culture that really integrates observability into how they work.”
But observability has to go beyond telemetry and logs. Organizations need a culture that really integrates observability into how they work, using data as a way to learn and outlearn the competition. When we launch a feature into the market, we set it up so that we can perform A/B testing in order to say which more effectively achieves our goals; and also use observability as a way to aid problem resolution.
New Relic: How mature are organizations when it comes to observability?
Gene: For six years, I’ve been a part of The State of DevOps Research, along with Dr. Nicole Forsgren and Jez Humble, which is a cross-population study that spans more than 36,000 respondents. Each year, the findings have indicated that the gap between high performers and low performers is very wide.
Among the high performers, you see spectacular levels of performance: they deploy faster and more frequently, they fix problems quicker, and their deployments are more successful. They have high levels of observability, as well as a bunch of great technical practices, great architectures, and great cultural norms. In contrast, among the lower performers, you see time to repair measured in days or weeks rather than in hours. So there are several orders of magnitude between high and low performers.
New Relic: What can organizations do to become high performers?
Gene: They need to focus on three areas: putting in place the right architecture, developing the right technical practices, and establishing the right cultural norms.
First, can their teams independently develop, test, and deploy value? And is their architecture tightly or loosely coupled? That’s one element. Second, their technical practices should include continuous integration and continuous delivery, as well as proactive telemetry in the production environment. The third area is cultural norms: is there a culture where it is safe to tell bad news, or is there a culture of suppressing bad news? This is an important predictor of performance.
When you work in this new mode, you have higher workplace engagement and higher employee net promoter scores. People are bringing their best work to their work, and a lot of literature suggests that when you have a highly-engaged workforce that cares about the customer, you end up with far better outcomes—and that’s not just technology, that’s for work in general.
New Relic: How important is it for organizations to have a cloud native approach to software development?
Gene: An organization’s architecture doesn’t necessarily have to be in the cloud, as long as teams can work independently of one another. This means they can test a component isolated from every other component.
If you have to test in the presence of every other component, you need an integrated test environment, and now you’re shackled to a tightly-coupled architecture. This type of architecture makes it impossible to identify which component is causing the problem and to identify issues before they go into production.
New Relic: Our research suggests that some organizations are struggling to recruit the talent they need to improve their software development. What’s your view on this?
Gene: A lot of research suggests that there’s a skills shortage but not a talent shortage. Overall, I think leaders aren’t doing enough to unleash the talent they already have within their organizations.
One example of a firm that has changed its approach to this is General Motors. For decades, its Fremont plant was the worst-performing automotive plant—not just in the US, but across the globe. But then GM entered a partnership with Toyota and the workers completed training in Japan. After this training, despite having the same people, the Fremont plant became one of the world’s best-performing automotive plants.