(This post is adapted from a talk called “Digital Moments of Truth While Building Our Mobile Apps” that I presented at FutureStack: New York 2017.)
If you build mobile applications, you undoubtedly care about quality—quality is of the highest importance. That’s a seemingly obvious statement, but how do you focus on quality when your team draws on dependencies from others or after you release your app to the world?
The mobile applications team at New Relic manages the full suite of our applications for iOS and Android, and we’ve had some pretty valuable moments of truth when it comes to quality and dependency management. Reactively, we have crash alerts turned on for all our mobile applications, so when a crash occurs we find out immediately. Proactively, we monitor all the downstream APIs our apps consume.
Every company has its moments of truth—from Black Friday transaction overloads to payload errors in third-party APIs—and I’m going to share just a couple we’ve experienced, and how we used New Relic ourselves to help us through them, proving out the utility of well-defined monitoring scenarios.
Moment of truth #1: Reacting right on time
You need to monitor your applications. Period. You need to track application performance and network performance. You need to know when things crash or go wrong. And you need to know it before your users do.
Not too long ago, we released a new version of our New Relic app for Android that included a big new feature: New Relic Infrastructure. We watched user counts grow as customers upgraded the app. Everything looked great. The following morning, however, New Relic Mobile alerted us to a crash.
Then we got a few more alerts. Whatever the issue was, it wasn’t affecting all our users, just some. We used New Relic Mobile’s crash reporting to drill down into the details of the crash.We looked through the different types of devices and operating systems affected, and we identified that the crash was happening only in one specific version of Android—4.4.4.
At first this seemed strange, but once we investigated a little further, we realized the version of Android build tools we had was not compatible with some newer libraries used in Infrastructure. We quickly updated the build tools, and pushed a new release to the Google Play store within a few hours.
We got off lucky, as only a very small portion of customers even noticed. But without good monitoring and alerting in place, this could have turned into a much larger problem.
Moment of truth #2: Proactively watching what we consume
If you have a large engineering organization like we do at New Relic, you know that downstream dependencies can sometimes be overlooked. I often hear app developers say that because they don’t own the APIs their service consumes, they don’t really care about them.
And that is the wrong attitude.
APIs are the the lifeblood of your application, I tell them. You’re app is dead without that data.
You need to monitor the APIs you use, even the ones you don’t own. You need see the full health of your endpoints, and you need to know about failures, timing data, and availability.
The mobile applications team uses New Relic Synthetics checks to monitor the APIs we consume. We specifically monitor preproduction APIs so we can we be more proactive in watching for changes.
For example, we monitor our staging environment to watch for breaking changes on upstream APIs (primarily, changes to JSON structures). We also wrote an API test we run in staging that sends a request with a specific structure and ensure it returns specific data (and we don’t just want a 200 OK response—we expect the payload to have a clearly defined structure).
You may be asking, why do we go through all this trouble? Well, here’s your second digital moment of truth:
It started with our Synthetics monitor alerting us that one of our staging APIs was failing. It was the middle of the day and the production environment looked fine and Slack was quiet; no one was panicking.
As we dug into the alert, we saw that a new version of the API in question had recently been deployed to staging …
… which was intriguing, because that meant that the downstream team was probably soon going to release that new version to production …
… and that would have broken all our mobile apps.
But, like I said, there are no disasters in this story. We notified the owners of that API and they fixed it well before it made it to production.
Your moment of truth is waiting
No matter what your business, some sort of moment of truth—when your systems shouldn’t fail—is always on the horizon. To cope is to get ahead, and to get ahead is to instrument everything, and monitor proactively as well as reactively. Your moments of truth are right around the corner. Will you be prepared?