This post was originally published on July 31, 2018, and updated on August 29, 2018.
With more than 43 million vehicles viewed per month, Dealer.com’s site performance and availability drives business for thousands of automobile dealerships across the country. If the site is not available or experiences an error and shoppers can’t quickly and seamlessly get a quote or do their preliminary research on a new car or truck, the sale might never happen.
In complex software environments like that at Dealer.com, each request typically makes its way through dozens of discrete services. A single problematic service along the path can affect the overall response time for that request, change a good customer experience into a bad one, and potentially cause customers to look elsewhere.
Software teams, working in an environment where there are many services involved in servicing a request, need to deeply understand the performance of every service, both upstream and downstream, so they can more effectively resolve performance issues, measure overall system health, and prioritize high-value areas for improvement.
Distributed tracing now generally available in New Relic APM
Today we’re announcing the general availability of distributed tracing, designed to give software teams working in modern environments an easy way to capture, visualize, and analyze traces through complex architectures, including architectures that use both monoliths and microservices.
“We’ve found New Relic’s distributed tracing to be super easy to integrate with. We simply updated our agent, and all of a sudden we had distributed tracing. It was a great experience.”—Andrew Potter, senior developer at Dealer.com, a Cox Automotive brand
Every customer with a New Relic APM Pro subscription will get this new capability at no extra charge; you will just have to update your agents and enable distributed tracing in your configurations. New Relic supports for Java, .NET, Node.js, Python, Golang, and Ruby, and assembles trace data collected across polyglot environments into detailed scatter chart and waterfall visualizations. In the coming months, we plan to deliver support for PHP.
Understanding modern software complexity
To understand why distributed tracing is so important, it’s helpful to look at how software environments are changing. Modern software technologies such as cloud platforms, containerization, and container orchestration are helping forward-thinking software organizations more quickly build, scale, and operate business-critical applications.
Traditional software environments typically included just a few large services. When issues cropped up, these relatively simple, monolithic architectures made it easier to identify which service was at fault so developer teams could dig through transactions inside that service to find critical bottlenecks or errors.
Today’s applications are often composed of hundreds or thousands of separate services built on ephemeral infrastructures. Some of these services are large monoliths built with legacy technologies, while others are clusters of smaller dynamic microservices. Despite the many advantages of these distributed architectures, the exploding number of components and their diversity in language, operating environment, and ownership creates a huge new burden for teams trying to manage them. Teams can’t effectively work towards resolving issues in a complex system until they understand the full call graph for requests and how the performance characteristics of dependent services are impacting their services. They need a complete view of the entire system.
For DevOps teams, understanding how a downstream service “a few hops away” can create a critical bottleneck for their service is essential for fast problem resolution. Just as important, it also provides teams with insight on how to optimize their code. If DevOps teams can’t determine when, why, and how an issue happens, small defects may continue to linger in production until a perfect storm of events aligns and the system breaks all at once. Distributed tracing provides engineers with the detailed view of individual requests so they can point out precisely what parts of the larger system are problematic.
Distributed tracing: Creating a steel thread
As organizations evolve their architecture to a more distributed architecture, they soon discover the need for distributed tracing. As New Relic’s Erika Arnold explained in a blog post earlier this year (The Difference Between Tracing, Tracing, and Tracing), we can describe distributed tracing as a way to instrument, propagate context, record, and visualize requests through complex, distributed systems.
Let’s look at how New Relic has implemented a solution that handles all four components of distributed tracing:
New Relic makes setting up distributed tracing easy by auto-instrumenting application code using language agents that work with hundreds of different libraries and frameworks across multiple languages. New Relic instruments each service involved in the request, whether it’s a monolith or a microservice, creates timings for operations within the service, and sends each measured operation as a “span” to New Relic’s platform.
New Relic automatically adds important troubleshooting information to each span. For example, when New Relic instrumentation creates a span representing a database query operation, it includes the database connection information and SQL query as attributes in the span. Customers using New Relic’s existing agent API to add custom attributes to transactions will see all their information in the trace as well, without changing anything.
New Relic’s distributed tracing solution automatically instruments your services to create a unique Trace ID for each incoming request and propagates the Trace ID and other necessary correlation information as the “trace context” across the entire call. For example, when one service makes a call to another service, New Relic adds the trace context to the HTTP Request header for the next service to use. New Relic’s auto-instrumentation is designed to eliminate the hard work of managing and propagating context, but if you’re using a transport that requires manual instrumentation, the New Relic agent provides an API that allows you to inject and extract the trace context.
New Relic agents send trace data to our Software-as-a-Service (SaaS) platform, where we ingest and store the data in New Relic Insights. Perhaps the least glamorous part of the system, this is where the hardest work gets done. New Relic already ingests and stores massive amounts of metrics, events, and other telemetry in a scalable platform so our customers can focus on building their business, not managing their monitoring platform. Because New Relic stores trace data for you in Insights, you can query trace data directly and create custom dashboards.
Finally, it all comes together in the New Relic APM user experience through trace visualizations designed to help you quickly understand why a specific request is slow, where an error originated, and where you can optimize your code to improve your customer’s experience. We do this by providing an advanced trace filtering capability and trace visualizations that bring together distributed tracing and New Relic APM.
This release is just the first of many, and is part of a larger product roadmap designed to help teams understand complex software environments using distributed tracing. Our roadmap includes:
- Updates to our PHP agent to support distributed tracing
- Integration with New Relic Browser and New Relic Mobile to connect client and backend performance
- Improved integration with New Relic Synthetics and Infrastructure
- Support for OpenTracing and OpenCensus
Getting started with New Relic’s distributed tracing
2. Deploy the latest APM agent to each service involved in the call path you’re interested in, and enable distributed tracing in the agent config.
3. The new “Distributed tracing” menu in New Relic APM will take you to the main “trace listing” view where you’ll be able to quickly identify slow traces and traces with errors. You can adjust the “time picker” to change the window of traces you want to view. You can use advanced filtering to find traces by combination of attributes.
4. Dive into distributed traces to see how long each span takes. You can click into each span to see historical performance charts and associated attributes that layer in the context you need to understand and troubleshoot issues. You can also jump right into the correlated APM overview page for a specific service involved in the trace. Here you can see deeper transaction information and stack traces to resolve issues in that service.
5. Build queries to create widgets inside of New Relic Insights to track the spans that are important to your team. Leverage attributes to facet your queries and filter your charts.
Here at New Relic, we understand that as modern software organizations evolve their environments, things are becoming more complex and difficult to understand and troubleshoot. New Relic’s depth of automatic instrumentation is built to make using distributed tracing super easy, so you can quickly understand why a specific request is slow, where an error originated, and where you can optimize your code to improve your customer’s experience.
You can find out much more about distributed tracing in the New Relic documentation:
- Introduction: https://docs.newrelic.com/docs/introduction-distributed-tracing
- Transition guide: https://docs.newrelic.com/docs/transition-guide-distributed-tracing
- Enable: https://docs.newrelic.com/docs/enable-distributed-tracing
- Using data: https://docs.newrelic.com/docs/understand-use-distributed-tracing-data
- Attributes: https://docs.newrelic.com/docs/distributed-tracing-attributes
- Sampling: https://docs.newrelic.com/docs/explanation-how-sampling-works-distributed-tracing-data
Want to be in the loop as we roll out new enhancements to our distributed tracing? Sign up here! And please be sure to post any questions or comments about using this new feature in the APM category in the New Relic Online Technical Community and tag it with #DistributedTracing.
This post contains “forward-looking” statements, as that term is defined under the federal securities laws, including but not limited to future roadmap for distributed tracing as well as the benefits of such features. The achievement or success of the matters covered by such forward-looking statements are based on New Relic’s current assumptions, expectations, and beliefs and are subject to substantial risks, uncertainties, assumptions, and changes in circumstances that may cause New Relic’s actual results, performance, or achievements to differ materially from those expressed or implied in any forward-looking statement. Further information on factors that could affect New Relic’s financial and other results and the forward-looking statements in this press release / post is included in the filings New Relic makes with the SEC from time to time, including in New Relic’s most recent Form 10-K, particularly under the captions “Risk Factors” and “Management’s Discussion and Analysis of Financial Condition and Results of Operations.” Copies of these documents may be obtained by visiting New Relic’s Investor Relations website at http://ir.newrelic.com or the SEC’s website at www.sec.gov. New Relic assumes no obligation and does not intend to update these forward-looking statements, except as required by law.