Unexpected Detours: Navigating Variable Topology in the Cloud

Who doesn’t love turn-by-turn navigation software? Now that most of us have it on our phones or built into our cars, it’s hard to imagine how we ever got anywhere without it. In fact, we’ve probably come to depend on this technology a little too much, because sometimes nav systems make mistakes. And sometimes the directions they give you are based on outdated information.

Like when TomTom doesn’t know the streets are blocked off for construction. Or it’s winter and Google Maps suggests a road that was buried by snow weeks ago. Or you drive into a lake because the GPS tells you to. The landscape we traverse is constantly evolving. New paths are forged and old ones are abandoned, occasionally requiring us to re-route on the fly. Negotiating these unexpected detours is like navigating the topology of a computing network, only with more swearing.

Over the past few weeks, we’ve seen the opposing schools of thought about the Fallacies of Distributed Computing and their relevance to web systems. Our previous post covered the lone exception: everyone agrees the Fallacy of Network Security must not be ignored when building applications for a web environment. While there is also consensus on some elements of today’s topic — specifically, that topology on the web is always changing — opinions differ on how to respond to this inherent network condition.

Failing to consider the effects of topology fluctuations is kind of like expecting your hometown to always look exactly like it did when you were growing up. Network topology changes just like everything else does. It’s natural; it’s necessary; it’s the circle of life. And it’s been true since the dawn of distributed computing. Servers are added, architecture undergoes renovations, and resources get moved. Infrastructure has to be rearranged from time to time. Sometimes it happens because of an outage, and sometimes it’s just because there’s a better way to do things. Either way, these changes are inevitable.

Moving to the cloud only makes network topology more dynamic, not less. As Brian Doll wrote last year, that’s one of the key advantages of the web environment — the ease with which you can adapt to evolving needs and conditions. Ignoring the Fallacy means building web applications that rely on static topology, which not only nullifies some of the cloud’s major benefits by hindering flexibility and scalability, it also puts apps at greater risk of failing if a single variable is altered. And that doesn’t make a whole lot of sense.

There’s no debate that the topology of web systems is in constant flux. There is, however, an argument that web technologies enable developers to overlook the issue when building apps. As we’ve done throughout this series, we’ll look at Tim Bray’s article The Web vs. The Fallacies to examine this contrasting perspective. Bray’s piece doesn’t suggest that this Fallacy has become true, but it does contend that “application builders almost never have to think about the problem.”

He reasons that the solutions for routing problems caused by changing topology are handled beyond the application level, “by the DNS, the network stack, and the Internet backbone operators.” Because URIs are used to address nodes and resources, the technologies are then shared across the web. Everyone within the environment, therefore, reaps the benefits without necessarily even realizing the topology has shifted.

Although we arrive at a different conclusion on the matter, you can’t argue with his assessment of what’s going on behind the scenes. In fact, a paper we’ve referenced several times in this space cites some good examples that illustrate Bray’s point. Both the WS-Addressing standard and SQL Server Service Broker — important for asynchronous interactions — simplified web development with next-hop routing mechanisms that point requests to the nearest router, rather than to their eventual destination. These solutions assume the system’s infrastructure (and thus message navigation) is constantly fluctuating, and deliver workarounds to give applications more flexibility.

Indeed, developers are fortunate to take advantage of such innovations. But depending solely on external solutions is like relying too much on a car’s nav system. It puts the fate of an application in someone else’s hands, when there are still methods within the developer’s control to account for unpredictable topology and facilitate successful point-to-point communication. The other flaw in the shared-solution approach is that it’s predicated on a meticulousness in URIs and redirects, which we know isn’t always carried out.

Instead, application builders should confront the Fallacy directly and take extra precautions that ensure the flexibility and reliability of their apps in a fluid environment. The systems expert who authored the aforementioned paper also advocates this more hands-on mentality and presents some practical suggestions for how to apply it.

First, he recommends, do whatever possible to avoid prescribing specific endpoints or routing paths because you don’t know if or when they’ll get modified. Then, consider identifying resources with DNS names instead of IP addresses to further minimize the impact of any network restructuring. It’s advisable as well to offer additional support in the form of location transparency or discovery services. Employing a few simple tactics like these can help developers limit their risk of routing problems and achieve greater control over app performance amidst turbulent topology.

Keep in mind, though, that topology changes can degrade application performance even if you’re proactive about avoiding routing problems. Let’s say you relocate a database to the other side of the globe while keeping the same hostname. Your app is still going to work as intended; it will find the database and access the data it needs. But performance is sure to suffer from a jump in latency between the app and database. It’s just another reminder that, although some technologies and techniques address the effects of the Fallacies, there’s no escaping the conditions intrinsic to distributed systems.

Next week: More circumstances beyond your control – the impact of web standards on administrator policies.

leigh@newrelic.com'

Marketing Manager, Content View posts by .

Interested in writing for New Relic Blog? Send us a pitch!