Harkamal Singh, a Manager of Programmability in the Customer Solutions group,  contributed to this post.

As the number of telemetry data sources continues to grow, software teams are finding gaps in their monitoring strategies. To mitigate this, more and more teams have turned to open source monitoring tools for collecting metrics, traces, and other telemetry data.

However, many open source telemetry tools require teams to operate and manage multiple complex layers—one for traces, one for metrics, one for logs, one for visualizations, and a database to store it all. These tools are also limited in terms of high availability, scale, and long-term storage.

As part of our continued effort to bring you a single place to ingest, analyze, visualize, and alert on any telemetry source on a single platform, we’ve released many APIs and SDKs for ingesting metrics into New Relic from virtually any service you’ve built. And now, using an output plugin built on the Golang Telemetry SDK,  you can add Telegraf to your expanding toolkit.

In this post, we’ll show you how to ingest data with Telegraf and send it to New Relic as custom metrics via the New Relic output plugin for Telegraf.

About Telegraf

Telegraf is a server-based agent that collects metrics from inputs—applications, databases, message queues, and moreand writes them into outputs, like New Relic’s Metric API.  Telegraf’s plugin-driven architecture and lightweight footprint—it requires no external dependencies like npm, pip, or gem—makes it a popular tool for collecting metrics from a variety of input sources; in fact, Telegraf has more than 200 integrations.

Since Telegraf is a compiled Go executable, and all plugins are compiled directly into the build, you don’t need to install any of the integration plugins—they are built in. Instead, you need only ensure the Telegraf version that you use has the plugins you need.

Here’s a quick look at some of the plugin types available from Telegraf:

Input source plugins

Most of Telegraf’s input plugins can be grouped as follows:

Beyond providing instrumentation for such commercial and open source packages, Telegraf shines in its ability to ingest metrics from generic data sources, such as files and sockets. (We’ll see the benefits of this in our example configuration below.)

Telegraf supports the following generic sources:

  • File – Consume the contents of an entire file
  • Tail – Tail a file
  • Socket Listener – Receive input from socket over TCP and UDP
  • HTTP Listener – Receive POSTS over HTTP
  • Http Poller – Periodically get data from a configured endpoint
  • Exec – Execute a process and get metrics from stdout
  • Execd – Execute a daemonized process

Input source formats

Using these generic input sources, you can configure generic data formats to receive and send data. When configuring Telegraf, consider the variety of input and output formats it supports:

  • JSON: Parses a JSON object or an array of objects into metric fields.
  • CSV: Creates metrics from a document containing comma-separated values.
  • Graphite: Translates graphite dot buckets directly into telegraf measurement names, with a single value field, and without any tags
  • CollectD: Parses the collectd binary network protocol
  • Logfmt: Parses data in logfmt format
  • Dropwizard: Parses the JSON Dropwizard representation of a single dropwizard metric registry
  • Grok: Parses line-delimited data using a language similar to regex
  • Nagios – Parses the output of Nagios plugins

Processor and Aggregator plugins

In addition to input and output plugins, Telegraf also supports processor and aggregator plugins, which provide powerful intermediate enrichment, transmutation, and aggregation to data coming through your system.

Our example won’t use these plugin types, but we highly encourage you to check out the Telegraf documentation to learn how to use these types in your workflows.

You can envision the whole pipeline as follows:

Telegraf pipeline

Sending metrics from Telegraf to New Relic

Now, let’s walk through an example where we ingest log data from a message queue and  send it to New Relic as custom metrics.

If you want to follow along, be sure to have Telegraf version 1.15.0 installed. And you’ll also need a New Relic Insert API key for sending data to the Metrics API.

Our example will use the following components:

  • Source: We’ll use the Tail input plugin, for tailing log lines from an input text file
  • Format: We’ll use JSON and take advantage of some of the features that allow us to refine the pipeline’s data.
  • Output: We’ll use the new New Relic output plugin.

Note: In this example, we’re using a legacy message queue system that doesn’t have a built-in input plugin for Telegraf; if we were using ActiveMQ or RabbitMQ, we could use those input plugins directly.

Here’s what our pipeline looks like:

Telegraf pipeline to New Relic

Again, we’re bypassing processors and aggregators, but they are options if we want to take advantage of them.

Step 1: Define a metric

Before we go any further, it’ll help to understand a bit about InfluxDB’s Line Protocol. An input plugin’s core function is to ingest one of a variety of formats and emit a logical representation of a line protocol.  A line protocol consists of a measurement name (think of it as a namespace), a set of one or more fields (usually numeric metric values that can be thought of as gauges, timers, or counters), and, a set of one or more tags. These tags are dimensional metadata that allows you to facet, group, and otherwise aggregate metrics. Within a given measurement, the number of series is determined by the unique set of tag values in the set, giving you a sense of your metrics’ cardinality.

So if our example message queue outputs log files with the following JSON format…

{"timestamp": 1588100043050, "metricName": "messageQueue.operation", "product": "Legacy MQ", "productVersion": "8.0.1.11", "connectionId": "64688F5E01E47020", "clientIpAddress": "36.1.90.2", "clientName": "JMS Producer", "queueManager": "QM1", "queueTopic": "MY.QUEUE.1", "putMessage": 3605}

{"timestamp": 1588100044203, "metricName": "messageQueue.operation", "product": "Legacy MQ", "productVersion": "8.0.1.11", "connectionId": "64688F5E01F0F77A", "clientIpAddress": "36.1.90.2", "clientName": "JMS Consumer","queueManager": "QM1", "queueTopic": "MY.QUEUE.1", "getMessage": 2000}

{"timestamp": 1588100060150, "metricName": "messageQueue.operation", "product": "Legacy MQ", "productVersion": "8.0.1.11", "connectionId": "64688F5E01E47020", "clientIpAddress": "36.1.90.5", "clientName": "Fin Producer", "queueManager": "QM1", "queueTopic": "MY.QUEUE.2", "putMessage": 400}

{"timestamp": 1588100065285, "metricName": "messageQueue.operation", "product": "Legacy MQ", "productVersion": "8.0.1.11", "connectionId": "64688F5E01F0F77A", "clientIpAddress": "36.1.90.7", "clientName": "Fin Consumer", "queueManager": "QM1", "queueTopic": "MY.QUEUE.2", "getMessage": 400}

… we would parse this metric it in the following way:

  • Measurement Name: derived from the JSON field metricName
  • Tags: derived from the following JSON fields:
    • product
    • productVersion
    • connectionId
    • clientIpAddress
    • clientName
    • queueManager
    • queueTopic

Fields will be derived from any other JSON field (except for timestamp, which will not be included in the field set). In our example file, we have two different fields that may occur: getMessage and putMessage.

Step 2: Configure the Telegraf agent and plugin

Now, we’ll configure the Telegraf agent and the input plugin.

Configuring the agent

Before configuring inputs and output plugins, we need to set some basic parameters related to how Telegraf fetches, batches, and flushes data. It’s beyond this post’s scope to provide full optimization details, but note that all input plugins are subject to these collection parameters.

(See the Telegraf documentation for a full overview of configuration instructions and options.)

# Configuration for telegraf agent

[agent]

  interval = "1s"   # Collect input every 1 second

  metric_batch_size = 1000 # send 1000 metrics at a time

  metric_buffer_limit = 10000 # don’t let the internal buffer grow past 10000

  flush_interval = "10s" # Flush every 10s or when we have at least 1000 metrics in the buffer

Configuring the input plugin

Here’s what our input plugin configuration looks like:

# Config for legacy MQ metrics

[[inputs.tail]]

  # The file we want to tail
  files = ["/var/log/legacy-mq-metrics.log"]

  # Don’t reach back to the beginning (it may be a ton of data)
  from_beginning = false

  # This plugin automatically adds this tag, we don’t want to emit it.
  tagexclude = ["path"]

  # Method used to watch for file changes.  Can watch by “inotify” or “poll”
  watch_method = "poll"
  data_format = "json"

  # This will be our Measurement Name
  json_name_key = "metricName"

  # Use the timestamp field for our metric timestamp, if omitted Telegraf will insert one automatically.
  json_time_key = "timestamp"
  json_time_format = "unix_ms"

  tag_keys = ["product",
              "productVersion",
              "connectionId",
              "clientIpAddress",
              "clientName",
              "queueManager",
              "queueTopic"]

Step 3: Test the input plugin

To make sure we won’t be sending “junk” to our metrics backend (i.e., New Relic), we can test our input plugin by configuring a file output. This simple configuration will allow us to see if our metrics are handled properly based on our input configuration.

# Send telegraf metrics to a file for debugging
 [[outputs.file]]
   
   ## Files to write to, "stdout" is a specially handled file.
   files = ["/var/log/metrics.out"]
   use_batch_format = false
   data_format = "json"

After restarting Telegraf, we get the desired output—a successful test.

{"fields":{"putMessage":3605},"name":"messageQueue.operation","tags":{"clientIpAddress":"36.1.90.2","clientName":"JMS Producer","connectionId":"64688F5E01E47020","product":"Legacy MQ","productVersion":"8.0.1.11","queueManager":"QM1","queueTopic":"MY.QUEUE.1"},"timestamp":1588448293}

Step 4: Configure the New Relic output plugin

Now we’re ready to configure our New Relic output plugin to send our Telegraf metrics to New Relic. We can have multiple output configurations, so we’ll leave the file output config for now.

Note: If you’re following the example, don’t forget to add your Insert API key where indicated.

[[outputs.newrelic]]
     
     insights_key = "[INSERT API KEY]"
   
   # we don’t need to send this as a field.  The plugin will send a proper timestamp via the Metrics API.
   fielddrop = ["timestamp"]

Exploring our Telegraf metric data in New Relic

Finally, we’ll navigate to our account in New Relic where we can use chart builder to  visualize our new metric data. Here we get three views of metric data from our message queue:

Max getMessage Operation Last 30 Minutes

Based on the data sent from our log file to New Relic, we can chart how many messages are in our queue. Specifically, this shows us the maximum number of messages for a GET operation for any topic in our queue for the last 30 minutes.

NRQL query

SELECT max(messageQueue.operation.getMessage) FROM Metric where metricName = ‘messageQueue.operations.getMessage’ SINCE 30 minutes AGO

Max getMessage Operation Last 30 Minutes Faceted by Topic

As in the previous chart, this shows us the maximum number of messages for a GET operation for any topic in our queue for the last 30 minutes, but we’ve used facet filtering to filter the results by the queueTopic attribute.

NRQL query

SELECT max(messageQueue.operation.getMessage) FROM Metric where metricName = ‘messageQueue.operations.getMessage’ SINCE 30 minutes AGO TIMESERIES facet queueTopic

Max getMessage Operation Last 12 Hours Ago Faceted by Topic (show as table)

This chart shows us the maximum number of messages for a GET operation for any topic in our queue for the last 12 hours. Here we facet this data by the queueTopic attribute and a time grouping of one hour.

NRQL query

SELECT max(messageQueue.operation.getMessage) FROM Metric where metricName = ‘messageQueue.operations.getMessage’ SINCE 12 hours AGO AGO facet queueTopic, hourOf(timestamp)

Bringing open source telemetry into New Relic

As you begin to revise and automate workflows to get telemetry out of all the services and components in your architecture, it’s important to realize that telemetry data formats are heterogeneous, sometimes quirky, and may require a highly-refined and configurable toolkit. By partnering with open source tools like Telegraf, we aim to give you the confidence that New Relic can ingest all the telemetry your systems create into our telemetry database. From there, youcan then curate and view it within the context of your other data assets.

Alongside the New Relic output plugin for Telegraf, we’ve recently provided some open source integrations built on top of our Metrics and Traces APIs, including integrations for Prometheus, Open Census and OpenTelemetry, Micrometer, DropWizard, and Istio. In addition to those, we’ve added the New Relic Flex to our New Relic infrastructure agent, which allows you to build “codeless” integrations on top of the New Relic Infrastructure to collect metric data from a wide variety of services.

Sign up for a free New Relic account, and get started with Telegraf and New Relic today.

Jim Hagan is a Boston-based Enterprise Solution Consultant with New Relic. He has 20 years of experience as a software engineer, with expertise in geospatial technology and time series analytics. Before joining New Relic, he worked on highly distributed logging and metrics platforms at Wayfair. View posts by .

Interested in writing for New Relic Blog? Send us a pitch!