As a service, our Telemetry Data Platform allows you to store and analyze metrics, events, logs, and traces, whether they come from New Relic agents or third-party sources, such as Prometheus. Traditionally, New Relic users have queried their data using the New Relic Query Language (NRQL). As we spoke with users about Prometheus, however, we learned that PromQL support would allow them to use their existing Grafana dashboards and enable new users to adopt our platform without learning NRQL.
Prometheus is an excellent tool for monitoring metrics from myriad infrastructure and other systems, but its lack of a long term, durable, and reliable storage solution presents a challenge. This challenge is why we built support for Prometheus’ remote write capability, giving you the ability to send your Prometheus metrics to the Telemetry Data Platform.
As we built our Prometheus support, we were fortunate to partner with Julius Volz, co-founder of the Prometheus project. While his recent blog (PromQL Compatibility Across Vendors) gives us a compatibility score of 31%, we know that our PromQL implementation supports 99.5% of the top Grafana dashboard queries, and we believe it will support the majority of your use cases. This blog post discusses our approach to PromQL support, when and why our implementation differs from PromLabs’ implementation, and most importantly, how New Relic supports your PromQL needs.
Built to support the most common queries
In developing our PromQL solution, we prioritized supporting the queries that customers use the most. Due to the popularity of Grafana as a tool to explore and visualize Prometheus metrics, we identified the most commonly used community dashboards and the PromQL queries upon which they’re built. This collection of more than 600 PromQL queries covers the top 7.8 million Grafana dashboard downloads. We knew that supporting these queries would cover the vast majority of our customers’ use cases. As a result, when we launched our support for PromQL in August 2020, our PromQL engine was able to parse, translate, and correctly execute 99.5% of the queries in the aforementioned collection. Coincidentally, a day after achieving these results, PromLabs’ promql-compliance-tester tool was open-sourced.
PromLabs’ blog makes an important point that should put New Relic’s 31% score into perspective: “…the numeric scores alone paint a limited picture. They don’t necessarily tell you how impactful implementation errors are, nor how many distinct behavioral differences there are.” Implementation differences don’t necessarily impact the user experience. To put a finer point on it: if our PromQL implementation returns a result that is nearly the same as what the PromLabs test suite expects, the result will have failed PromLabs’ test suite, even though the result is just as useful for your real-world troubleshooting needs.
Metrics matter, but so does scalable, long term storage
While Prometheus and the Telemetry Data Platform have similarities (support for ad-hoc schemas and flexible queries without user-specified indices), they follow separate design principles that require different trade-offs. Query languages, query execution models, data models, and storage formats are often intrinsically linked through design constraints.
The PromLabs compliance test highlights a subtle yet fundamental point: support for PromQL is tightly coupled to the Prometheus data model and the Prometheus storage implementation. Due to different strategic objectives, we’ve chosen to make different design trade-offs than PromLabs; notably, we’re focused on:
- Telemetry data types: The Telemetry Data Platform supports metrics, events, logs, and traces, whereas Prometheus only supports metrics.
- Storage systems: The Telemetry Data Platform provides scalable and cost-effective, long-term storage, whereas Prometheus does not.
- Computing deltas: The Telemetry Data Platform computes deltas at ingest time rather than at query time, providing faster results over larger data sets. Another advantage of this approach is that it avoids undesired results in some functions at query time (see below).
The consequence of adhering strictly to the PromQL protocol forces a vendor to make the same trade-offs enumerated above, requiring a storage interface that supports the exact data model. However, matching the query execution model and storage interface limits a platform’s flexibility to make different design trade-offs and ultimately limits its ability to provide additional benefits to users. One such trade-off involves computing rates on absolute vs. cumulative counters at query time, which have been the subject of heated debates in the open source community. Specifically, the rate() and increase functions calculate deltas between disjoint value pairs, but in doing so, they discard part of the data and extrapolate the rest, returning unexpected results. The Telemetry Data Platform suffers no such problem because it computes deltas at ingest time rather than query time, utilizing every data point for the relevant timeframe.
Prometheus features a pull-based model that does not provide distributed, scalable storage. As a result, Prometheus uses cumulative counters, computing deltas at query time in rate-like queries to offer graceful degradation of metric resolution and to avoid data loss in the event of delivery failures. Given Prometheus’ single-threaded execution on a single node, this is an appropriate trade-off.
In contrast, the Telemetry Data Platform uses a push-based delivery model as part of a highly available and scalable platform with a multi-threaded query execution model spread across multiple nodes. Additionally, the Telemetry Data Platform’s data model is optimized for delta counters to provide cost-efficient and scalable storage and a more intuitive query experience for rate-like operations.
Elsewhere, the differences between Prometheus and the Telemetry Data Platform are less pronounced. We’ve built a guide to convert PromQL queries into NRQL, so you can continue asking the same types of questions of your Prometheus metrics when they’re stored in the Telemetry Data Platform.
One place for all your telemetry data
Beyond Prometheus, the Telemetry Data Platform can ingest dimensional metric data from virtually any source, and that data can be analyzed, visualized, and correlated with events, logs, and traces. This flexible and schemaless data model gives you the ability to diagnose and resolve issues quickly, no matter the system or application. And the language you use to interact with telemetry data is an important part of the troubleshooting experience—NRQL provides an easy-to-learn and familiar SQL-like syntax. But if you’re just looking to query Prometheus metrics, PromQL syntax works just as well.
Even if you’re focused just on Prometheus metrics, organizations struggle with federated Prometheus servers, lacking unified storage to query and analyze their systems’ performance. In contrast, the Telemetry Data Platform gives access to all of the data from all of your Prometheus servers in one place, whether your Prometheus servers are running in a sharded configuration or as replicas for high availability with deduplicated results. This capability allows you to query, visualize, and alert on metric data across all Prometheus instances. It’s also easier to maintain than a federated Prometheus configuration.
Query Prometheus in New Relic One and Grafana
The Telemetry Data Platform enables you to query your Prometheus metrics using PromQL syntax directly inside New Relic One and Grafana. To make this happen, we translate your PromQL query into a NRQL query.
To visualize the data, you have two options:
- Within New Relic One: Use PromQL-style mode or NRQL in chart builder, as well as dashboards and custom applications.
- Within Grafana: Configure the Data Platform as a Prometheus data source in Grafana.
We’re constantly improving our support for PromQL, so expect support for even more functions in the future. If there’s something specific that you’d like to see, contact your account team (paid accounts) or enter a request in New Relic Explorer’s Hub (free tier). If you’d like to learn more about the PromQL features we support, check out our docs, including details about translating PromQL into NRQL.