When it comes to building complex IT systems, there’s one thing you can know for sure: inevitably something will break. So how do you plan for things to not go according to plan?
New Relic’s brand new four-part series on un-reliability engineering is designed to show you exactly how to find and fix problems as quickly as possible. You can watch all four videos in order here, or read on to jump to the episode you’re most interested in seeing:
Episode one—What is Un-Reliability Engineering?—explores the current landscape of site reliability engineering. Your host, New Relic’s Clay Smith, explains how the wide variety of software development tools, languages, and frameworks can create highly complex, constantly changing environments. Hard-to-predict combinations of states, configurations, and code changes make it almost impossible to plan for every possible situation and lead things to work in unexpected ways at the worst possible moments. Hence, site unreliability.
Now that we know the pitfalls of designing and running complex systems, episode two—Understanding Complex Software—takes a deeper dive into different data types and instrumentation techniques that can help us understand and operate complex systems.
Observability is critical, but how do you actually enable the level of visibility needed to keep your system up and running? In episode three—Using Instrumentation to Enhance Functionality—we look at what instrumentation is and how to incorporate it into your software stack.
In the final episode—Instrumentation ROI—Clay explains how to demonstrate the return on investment of instrumentation in complex systems. The bottom line: Instrumentation can not only make your job easier, but also simplify communicating the value of your work to the rest of your organization.
Site un-reliability is real, and inescapable. But this New Relic video explains how proper instrumentation can help you manage and minimize the disruption it causes.