Recently, an important announcement came out of 2019’s CloudNativeCon Europe: Two open source distributed tracing projects, OpenCensus and OpenTracing, joined forces under the Cloud Native Computing Foundation (CNCF) as a new project called OpenTelemetry. While the differences between OpenCensus and OpenTracing haven’t always been clear to the uninitiated, both projects have been critical tools for those seeking to implement observability of their systems using open standards.
Over time, the maintainers of OpenCensus and OpenTracing recognized that if users were going to succeed with distributed tracing, a unified standard would be of paramount importance. That’s why the goal of OpenTelemetry is to merge these projects and create “a single set of system components and language-specific telemetry libraries” to standardize how the industry uses metrics, traces, and eventually logs to enable observability. The goal, as stated in Merging OpenTracing and OpenCensus: A Roadmap to Convergence, is “a complete telemetry system…compatible with most major OSS and commercial backends.”
A major component of the OpenTelemetry specification is distributed tracing. Let’s look a little deeper into what distributed tracing is and how OpenTelemetry unifies the OpenCensus and OpenTracing projects.
What is distributed tracing?
To understand why distributed tracing is so important, it’s helpful to look at how software environments are changing. Cloud-native technologies like containers, microservices, and serverless are helping forward-thinking software organizations more quickly build, scale, and operate business-critical applications. These applications and sites increasingly use interconnected cloud-based services, and troubleshooting requests between those services can’t always be done with logs and metrics alone.
This is where distributed tracing comes into play: It provides developers with a detailed view of individual requests as they “hop” through a system of microservices. With distributing tracing, you can:
- Trace the path of a request as it travels across a complex system.
- Discover the latency of the components along that path.
- Know which component in the path is creating a bottleneck or failure.
In other words, tracing is about analyzing, recording, and describing transactions.
Key concepts of distributed tracing
No matter which specification your distributed tracing implementation follows, there are a handful of universal concepts to know. These terms inform any discussion of OpenTelemetry and how it will eventually replace OpenTracing and OpenCensus.
Trace: A record of activity that supports a request through a distributed system. A trace is a tree, or a directed acyclic graph (DAG), of spans.
Spans: Spans are named, timed operations representing a contiguous segment of work in a trace. Spans are related to one another through a parent-child, or causal, relationship. In the following image, spans B and C are children of span A, with span G “following from” span A.
Root span: The root span is the first span in a trace. The root span duration often represents the duration of the entire trace, or come very close to it. In the above example, span A is the root span.
Context propagation: Tracing doesn’t work without context propagation. Typically, because a distributed trace traverses more than one service, uniquely identifiable information is required in all the paths a request may take.
Tracer interface: A tracer interface provides the means to create and interact with spans. In OpenTracing, for example, vendors write tracers that inject and extract the metadata needed for transmitting spans throughout a system of microservices. OpenTelemetry implementation libraries will use an agent/collector model similar to those that existed in OpenCensus.
OpenTracing and OpenCensus: Competing open standards
Both OpenTracing and OpenCensus give developers the means for vendor-neutral observability. Inspired by Google’s Dapper paper, which explains an early production distributed tracing infrastructure, OpenTracing is a vendor-neutral semantic specification that defines an API for writing distributed tracing.
OpenCensus, initially developed and used internally at Google and later released as an open source tool, is a collection of libraries for gathering distributed tracing and application metrics data. OpenCensus also provides automatic context propagation across its supported languages and frameworks.
While both standards did achieve their goal of making observability easy for many, the fundamental problem is that there were two standards. For distributed tracing, it is particularly important to have one standard, because if even a single point in the causal chain is broken, the trace is no longer complete.
OpenTelemetry: A merging of standards
OpenTelemetry aims to combine OpenTracing and OpenCensus into one open standard. The promise of a single standard should not be understated: Without having to choose between two competing standards, developers will have an obvious choice to achieve observability, and will more likely include instrumentation directly into their libraries and frameworks. The OpenTelemetry roadmap lays out a deliberate approach to make OpenTelemetry backwards compatible with both OpenTracing and OpenCensus via “compatibility bridges.”
The OpenTelemetry project has not yet shipped a production-ready release. The schedule currently calls for OpenTelemetry to “reach parity with existing projects” for Java, Node.js, Golang, C#, and Python by early September 2019, and OpenTelemetry will “sunset” old projects in November 2019.
For more specifics of what’s in store for this new project, check out the OpenTelemetry specification on GitHub.
The questions in the table below offer a look at how OpenTelemetry compares to OpenCensus and OpenTracing:
|What is it?||A specification for tracing APIs||A specification and a set of libraries||A set of standard APIs and libraries (not yet in production)|
|What telemetry is included?||Distributed traces||Metrics and distributed traces||Metrics and distributed traces (and eventually logs)|
|How does it work?||Developers choose an available OpenTracing tracer or write their own to instrument their code, which can be visualized by a compatible backend||Developers use an OpenCensus agent, which includes auto-instrumenting packages, to instrument their code. Data can be exported to any compatible backend||Developers use an integrated set of libraries and APIs to instrument application code with an agent and collector|
|How are spans formatted?||Each span encapsulates:|
- Operation name: Typically a reference to the API call that started the span
- Start/finish timestamp: When the span started and finished
- Tags: User-defined annotations for querying and filtering spans
- Logs: Key/value pairs for tracking span details in logs
- SpanContext: Span metadata carried across span boundaries
|Each span encapsulates:|
- Name: A string defining what a span does
- SpanID: The span’s unique identifier
- TraceID: The identifier of the trace to which the span belongs
- ParentSpanID: The identifier of the span that caused the span
- StartTime/EndTime: When the span started and finished
- Status: An error model that defines the span at a fixed point in time
- Time event: Defines an event that happened at a fixed point in time
- Link: Defines relationships among spans within a given trace
- SpanKind: Defines relationships between spans
- TraceOptions: A byte on each span defining if the span is sampled
- Tracestate: Contains details about a span’s position/ordering
|Compatible with both OpenTracing and OpenCensus|
New Relic is committed to OpenTelemetry
Our founder, Lew Cirne, recently laid out a new vision for New Relic: a “programmable future” in pursuit of a more perfect internet. We believe that OpenTelemetry contributes to that vision and goal. We are actively participating in the formalization of tracing standards such as W3C Trace Context, and we plan to contribute to the OpenTelemetry project.
New Relic launched its first distributed tracing offering in 2018, and most recently unveiled global trace search functionality as part of New Relic One, the industry’s first entity-centric observability platform. But we’re not done: We’re invested in bringing full OpenTracing, OpenCensus, and OpenTelemetry support to our customers, so that they can access and visualize all their correlated telemetry data through New Relic distributed tracing and the New Relic One platform.