Red Sox versus Yankees. Farmers versus cowboys. The Jets versus the Sharks. Whatever the endeavor, we humans have a propensity to find opponents to our team’s goals—even when we’re actually all on the same side.
That dynamic sometimes plays out between traditional enterprise software development and operations staffers. For decades, dev and ops teams have all too often seen themselves as rivals rather than partners. It shouldn’t be this way—and it doesn’t have to be this way. The more each team understands the other’s needs, the more effective both can be.
That’s one of the goals behind the DevOps movement, of course, and one reason it continues to gain followers. And it’s the reason we recently asked sysadmins and operations staffers for responses to the question “If you could get your software development team to change one thing, what would it be?” (We also asked the reverse question of developers, and you can find their responses here: 3 Way Ops Can Help Devs: A Developer Perspective.)
The responses don’t necessarily reflect New Relic’s views—we’re dedicated to helping everyone get along and work together to meet their common goals—but they do help illuminate the depth of the existing divide, and perhaps suggest some approaches to bridging it.
One thing is for sure: progress will require both sides to pay more attention to each other’s perspective. “Ops needs to be aware of and collaborate much earlier into the dev process, and dev needs to be aware of and collaborate much later into the ops process,” says Dave Caplinger, director of technical product management at Solutionary.
3 Key themes
While our survey was far from scientific, the ops folks we heard from seemed most invested in three key themes:
- Things break. Deal with it and plan for it.
- Design log files, installers, and other tools to help ops identify and fix problems.
- Security is an “everyone problem.”
Let’s take a closer look at each one.
1. Things break. Deal with it and plan for it
Too often, say some ops professionals, developers put all their attention on making software work today, with not enough thought to maintenance tomorrow. In sysadmins’ eyes, too many devs deploy their software (or hand it over to ops to deploy for them) and walk away. But ops, charged with keeping things going, can get frustrated when developers move on too soon.
“After we deploy, in spite of all our efforts, things will still break,” notes Caplinger. “Disks will fill. Network latency will be variable. Links will fail. Files you expect to write will be read-only. A process will run out of file descriptors. A critical, redundant system you depend on will be down. Systems need to keep running as well as possible in spite of this.”
Ops wants devs to avoid making assumptions about the state of anything. That means devs need to create better sanity checks and provide better inline comments and documentation. Because when the software fails in production, it’s the ops team that gets the phone call at 2 a.m., and they’ll need all the help they can get.
A classic example is when developers assume always-on connectivity. “The network is not a static monolith that never changes,” one ops staffer noted. “We’re planning a data center network upgrade. It will require disconnecting every server and reconnecting them to the new switches.” That could cause some apps to think the entire world has ended and crash in an untidy heap.
The key is to plan for maintainability, adds another sysadmin. “Software usually stays in production far longer than the original coders anticipated, so maintenance should be provided for during coding,” he says. He sees it as the developers’ responsibility to help ops with managing the failure points, and with understanding the developers’ assumptions.
2. Design log files, installers, and other tools to help ops identify and fix problems
Log files for applications record events, the IP address they came from, the result code, and similar data. Happily, plenty of tools can help organize those log files and interpret what happened.
But ops is often frustrated by useless event log entries that don’t help anyone determine what went wrong. From the ops perspective, there’s a big difference between “I captured information” and “Anyone can figure out what happened.” One ops staffer would like to convince developers not to conflate multiple errors into the same error code: “When a service goes down, I do not want to play Russian Roulette on which process to restart.” This also applies to creating application installers, scripts, or other software packaging when the software runs on a desktop or server OS.
3. Security is an “everyone problem”
Ops folks also want developers to remember that IT security best practices also apply to them. That includes basic things like passwords.
“The last thing I need to worry about is simplified, easy-to-guess default passwords floating around in your code,” says Greg Willis, director of technology operations and systems architecture at Morpheus Data. “The ones that always get you are things like the Java keystore that never got changed from ‘changeit.’” If you’re lucky, a sysadmin will change these third-party software passwords during setup. But how much do you want to count on luck?
“Security is not just an ops problem,” explained another ops professional—it’s an everyone problem.
It all comes down to better communication
As these comments show, sometimes the us-versus-them arguments can make us lose track of the shared goal: creating and delivering software that brings users joy. “We’re not done with the release at code commit (or package build, or…),” Caplinger says, “We’re done only if it’s live in production and customers see and use it.”
If you’re in ops, please let us know on Twitter (@newrelic) what you’re looking for from the dev team.