New Relic customers often cite distributed tracing as one of their most important troubleshooting and performance optimization tools. That’s why New Relic is implementing Trace Context: a W3C standard that makes distributed tracing easier to implement, more reliable, and ultimately more valuable for developers working with modern, highly distributed applications.
For nearly two years, New Relic has participated in the W3C Trace Context Working Group—helping to define the standard and shepherd it through the approval process. Today, the W3C announced a major milestone in this process: The Trace Context specification has reached “Proposed Recommendation” status, which means it is just one step away from becoming a final W3C Recommendation.
New Relic is already working to implement the Trace Context specification, and will release support for its Java, Node.js, Ruby, Python, and Go agents in early 2020. In this post, we’ll explain why Trace Context is so important for our customers who use distributed tracing; what the transition to W3C Trace Context compliance will look like to our customers; and how New Relic’s support for Trace Context exemplifies our strategic commitment to open source software and open standards.
The trouble with distributed tracing today
Distributed tracing is an essential tool for any developer working with highly-distributed microservices applications, allowing them to track event interactions that traverse multiple microservices. (To learn more, review our introduction to distributed tracing, and also learn how New Relic implements its own tracing solution.)
Every tracing tool, however, requires a way to “correlate” each step of a trace, in the correct order, along with other necessary information to identify and diagnose performance. This involves assigning a unique ID to each transaction, assigning a unique ID to each step in a trace, encoding this contextual information as a set of HTTP headers, and passing (or propagating) the headers and encoded context from one service to the next as the trace makes its way through an application environment.
Previously, each distributed tracing tool employed custom headers and context formats—for example, Zipkin’s B3 format and New Relic’s own proprietary format. This wasn’t a problem when trace context headers mostly traveled between services monitored by a single tracing tool or when headers rarely propagated beyond a single organization’s network and middleware infrastructure.
Today, however, this jumble of mutually incompatible header formats is definitely a problem. Many enterprises, for example, let teams pick their own tracing tools; when these tools receive trace context headers they don’t understand, they typically drop the headers and break the trace that relied upon them. Trace context headers are also more likely to traverse middleware boundaries including proxies, service meshes, and messaging systems along the way. Some of these devices will pass along proprietary headers intact, but many others will drop them, once again resulting in broken traces.
Trace Context: clearing the limitations of legacy tracing tools
As these issues grew more costly and complicated, APM vendors, cloud platform providers, and other commercial vendors joined with members of the open source community to form a working group dedicated to creating a common context propagation format—what we now know as the W3C Trace Context standard.
The Trace Context specification defines a pair of standardized context headers that serve to propagate context correlation information between services and middleware:
traceparentheader contains the data elements that every distributed tracing model requires to define and propagate context: a trace ID, a parent ID, and a sample flag.
tracestateheader holds vendor-specific, contextual data, typically in order to support additional functionality or optimizations associated with a particular tracing tool.
Given New Relic’s long history of involvement in the Trace Context project we’re excited to see the specification reach a stage where we can implement the standard for our customers, and we’re committed to implementing Trace Context support in a way that is seamless to our current distributed tracing experience. New Relic agents will maintain backward compatibility—accepting and emitting both the W3C format as well as the New Relic proprietary format, making it possible for customers to upgrade agents without special consideration for what format is being used.
To understand why New Relic’s commitment to backward compatibility in our Trace Context implementation is important, consider the following diagram:
In this scenario, a transaction begins in a service instrumented by a New Relic agent that doesn’t support the W3C Trace Context. When Service 1 calls Service 2, the New Relic agent will automatically propagate context using New Relic’s proprietary format.
Service 2 is instrumented by a New Relic agent that supports both the New Relic proprietary format and the W3C Trace Context format. It will accept the New Relic format, and when it makes a call to Service 3, it will automatically propagate context using both the New Relic proprietary format and the W3C format.
Service 3 is instrumented by a New Relic agent that doesn’t support the W3C format. When it receives the request from Service 2, it will find the New Relic proprietary header that it expects—and the trace will remain intact.
This scenario illustrates how easy and transparent it will be for New Relic customers to transition to the new W3C standard.
Of course, Trace Context is a useful and important tool for ensuring that New Relic’s native distributed tracing tools can traverse services instrumented with agents from other vendors without the risk of broken traces, and reliably traverse third-party components, including proxies and API gateways. But it’s just as important that Trace Context will confer the same advantages upon open source tracers, enabling our customers to incorporate tracing telemetry from any source, at any time, and to implement traces across highly distributed application environments. This makes Trace Context a critical, and very welcome, technology for New Relic’s open source telemetry initiative.
Connecting customers with the benefits of open standards
There’s s still plenty of work to be done to improve interoperability for distributed tracing tools and telemetry; for example, while the current Trace Context project focused on HTTP (a commonly-used protocol with no built-in capabilities for propagating trace context data), the same group of committee members are also working to define trace context propagation in other formats, such as AMQP and MQTT for Internet of Things (IoT) environments. Members are also looking ahead to the goal of achieving true end-to-end context propagation, including improvements to better support web browsers and other client-side devices as a next-step priority.
In the meantime, New Relic is excited to support the W3C Trace Context specification, and we’re looking forward to working with commercial vendors and open source projects to promote support for the standard. Trace Context is a great example of how competing firms can come together in an effort that benefits our customers, regardless of their chosen vendor or tracing tool.