When working with distributed logging for the first time, a developer’s first instinct might be to send application logs directly from their application to a logging backend. A direct connection is appealing: afterall, there are fewer moving parts to manage. Such communications typically happen over transactional REST APIs, giving devs a false sense of security that all their logs will get through.

 

Unfortunately, there are three points of fragility with this model:

  1. Backpressure on HTTP requests can disrupt the normal functions of instrumented code, particularly if logs reach an unexpected size or rate.
  2. When sending data to the logging backend, latencies can cause lags in the transmission of log data.
  3. Network connectivity issues can lead to lost logs, which is especially troubling when agent logs from an APM-instrumented application correlate with an outage.

A central principle of site reliability engineering is the constant observation of system telemetry to make systems less fragile. Decoupling your log forwarder from your application can definitely make your system less fragile: When you decouple your forwarder from your application code, you’re able to keep complex processing computations in a separate process, and more frequently update code or configurations for your forwarder without worrying about affecting an underlying application.

Additionally log forwarders have built in memory or file based buffering that provides critical flexibility for all manner of latencies and interruptions between the application data center and the logging backend.

Log forwarders also can be:

  • Extended to support a wider variety of network protocols and output formats than underlying application code
  • Separated at the infrastructure level (i.e., you can run them on different hosts or in different containers)
  • Load balanced
  • Used in complex pipelines to achieve upstream aggregation and durable buffering

In this post, I’ll share five enterprise-ready patterns for using log forwarders to deliver logs to a logging backend, such as New Relic Logs—a central component of New Relic One. These patterns will give you a general understanding of the practical choices you can make to reduce fragility in your overall log pipeline by reducing latency, error, and saturation. My goal is to demystify the process of distributed logging and provide practical patterns you can use today.

Choosing a log forwarder

Logstash (part of the ELK stack), Rsyslog, and Fluentd are common and relatively easy to use log forwarders. Newer forwarders like Wayfair’s Tremor and Timber.io’s Vector were built for high-performance use cases, but it’s well beyond the scope of this document to compare and contrast all of them.

The New Relic Infrastructure agent supports log forwarding by means of a Fluent Bit extension. The configuration is generally compatible with Fluent Bit syntax. It’s convenient to use that built-in forwarder to send out logs if you deploy the Infrastructure agent. In this post, my example configurations will use Fluentd and Fluent Bit as standalone forwarders. You can use these in scenarios where you can’t install the Infrastructure agent or you want a centralized forwarding layer to handle multiple distributed sources.

Note: In addition to Fluentd and Fluent Bit, Logstash and Vector also provide plugins for New Relic Logs. And other forwarders not listed here may have flexible configuration options with which you can enable them to send logs to any backend, including New Relic.

Consider the following characteristics to determine if you need Fluentd or Fluent Bit:

 

Pros

Cons

Fluentd

  • Hundreds of plugins available

  • Receives frequent updates


  • Built in Ruby and requires several Ruby gem dependencies

  • High-memory footprint

Fluent Bit

  • Complied in C with zero additional dependencies

  • Low-memory footprint

  • Suitable for IOT and embedded systems


  • Receives less frequent updates

  • Only 15 plugins available

Install Fluentd or Fluent Bit

New Relic Logs offers a fast, scalable log management platform that allows you to connect your log data with the rest of your telemetry data. Pre-built plugins for Fluentd and Fluent Bit (and others) make it simple to send your data from anywhere to New Relic One.

See the New Relic documentation for installation and configuration instructions:

The following examples assume you want to forward logs to New Relic, in which case you’ll need a New Relic license key. If you don’t yet have a license key but want to test basic forwarding functionality, you can download a forwarder (see the links below) and configure it to write to an output file for testing purposes.

NOTE: The forks of these tools that are built for package managers follow the naming convention of td-agent (Fluentd) and td-agent-bit (Fluent Bit).

Fluent Bit packages:

Fluentd packages:

Now let’s look at five patterns for forwarding logs to New Relic One (or whatever backend you want to use). For each pattern we’ll look at the pros and cons, and I’ve also included some example configurations.

Pattern 1: Co-located forwarder with a file tailer

In this pattern, you’d use a file tailer to watch one or more log files and send new lines to your logging backend as they’re written. The forwarder sits between two applications on an application host. Most forwarders have rich configuration options to determine exactly how the tailing works and what kind of buffering will be used when posting to the logging backend. For additional scalability, a co-located forwarder can also forward onto an off-host forwarder layer (similar to pattern 3 discussed below).

Pros of pattern 1

  • This pattern scales automatically with your application infrastructure, as there is one forwarder per unit of infrastructure.
  • Since this pattern uses log files, you can forward logs from legacy applications that you may not be able to rebuild with modern logging tools.

Cons of pattern 1

  • Each forwarder can consume considerable compute resources, and this can add to systematic over deployment of your application infrastructure.
  • Configurations for the co-located forwarder becomes a dependency for each application’s configuration, which can add to the complexity of application configuration and deployment.
  • At points of peak load, log files can grow so large that that file tailer (as well as log rotate utilities) may not be able to keep up, causing log lag and possible storage issues.

Example configuration for pattern 1

Note: The examples here (and in the rest of this post) refer to a Python APM agent logger configuration. For an example of how to fully configure logs in context with the Python agent, see the documentation. In addition, see the official python documentation on setting up logging handlers to ensure you set up your output correctly.

.
.
# Instantiate a new log handler
handler = logging.FileHandler(‘/var/log/app-a.log’)
.
.

Fluentd

<source>
  @type tail
  <parse>
    @type none
  </parse>
  path /var/log/app-a.log
  tag app-a
</source>
<source>
  @type tail
  <parse>
    @type none
  </parse>
  path /var/log/app-b.log
  tag app-b
</source>
<match **>
  @type newrelic
  license_key <NEW_RELIC_LICENSE_KEY>
  base_uri https://log-api.newrelic.com/log/v1
</match>

Fluent Bit

[INPUT]
    Name        tail
    Path        /var/log/app-a.log 

[INPUT]
    Name        tail
    Path        /var/log/app-b.log

[OUTPUT]
    Name newrelic
    Match *
    licenseKey <NEW_RELIC_LICENSE_KEY>

Pattern 2: Co-located forwarder using sockets

In this pattern, your application code will send logs directly to your forwarder over a UDP or TCP port (no files are stored), and the forwarder will in turn forward them asynchronously to New Relic (using the New Relic Log API). The forwarder sits between your app and the New Relic backend and provides a minimal buffer and processing layer.

Note: It’s beyond the scope of this post to provide guidance on whether you should use UDP vs TCP.  In general, UDP will have the minimal impact on your application, but the protocol has a lower delivery guarantee. Most high-volume log environments will eventually gravitate toward UDP for sending logs to a forwarder for this reason. Certain security applications, such as a SIEM software, will still tend toward using TCP to ensure completeness.

Pros of pattern 2

  • This pattern scales automatically with your application infrastructure, as there is one forwarder per unit of infrastructure.
  • This pattern uses socket protocols for log input, so there is no need to store or rotate files. And since the forwarder is co-located, the network overhead is low.
  • One forwarder can receive logs from any application that can access it over the network interface.

Costs of pattern 2

  • Each forwarder can consume considerable compute resources, and this can add to systematic over deployment of your application infrastructure.
  • Configurations for the co-located forwarder become a dependency for each application’s configuration, which can add to the complexity of application configuration and deployment.
  • You’ll have no physical log file that you can explore to troubleshoot applications on the host (unless you write that as a different configuration).
  • TCP protocol can still cause back pressure into the application.
  • You’ll need to tune TCP and UDP kernel parameters for this use case.
  • You’ll need to monitor system telemetry, such as:
    • UDP Buffers (for UDP)
    • UDP Buffer Receive Errors
    • TCP Errors

Example configuration for pattern 2

UDP

# UDP Example
.
# Instantiate a new log handler
handler = logging.DatagramHandler(‘localhost’, 5160)
.
.

TCP

# TCP Example
.
# Instantiate a new log handler
handler = logging.SocketHandler(‘localhost’, 5170)
.
.

Fluentd

<source>
  @type udp
  <parse>
    @type none
  </parse>
  tag udp_5160
  port 5160
  bind 0.0.0.0
</source>  

<source>
  @type tcp
  tag tcp_5170
  <parse>
    @type none
  </parse>
  port 5170
  bind 0.0.0.0
</source>
 
<match **>
  @type newrelic
  license_key <NEW_RELIC_LICENSE_KEY>
  base_uri https://log-api.newrelic.com/log/v1
</match>

Fluent Bit
Note: UDP is not supported as a built-in plugin for Fluent Bit.

[INPUT]
    Name        tcp
    Listen      0.0.0.0
    Port        5170
 
[OUTPUT]
    Name newrelic
    Match *
    licenseKey <NEW_RELIC_LICENSE_KEY>

Pattern 3: Separately located forwarder using sockets

In this pattern, your log forwarder is located outside the application host. Your application code will send your logs to your forwarder over a UDP or TCP port, and the forwarder will in turn use the Log API (encapsulated in its New Relic output plugin) to forward them to New Relic One.

Moving your forwarder into separate infrastructure gives you an economy of scale regarding compute resource utilization, and centralizes configuration and maintenance to the log forwarder infrastructure. With this pattern, different application families can send log data into the same infrastructure pool. You could even use different ports or protocols, as well as pattern matching, to do custom handling of logs coming from different sources if needed.

Pros of pattern 3

  • In this pattern, you have specific infrastructure provisioned specifically to handle logs; you don’t need to provision the forwarder as part of your application infrastructure.
  • You can send logs from different application pools into the same log forwarding pool.
  • You’ll eliminate backpressure since you can scale your forwarder independent of the application.
  • You can use a number of powerful methods for achieving a durable buffer; for example, you could use Apache Kafka to store logs before shipping them to New Relic One or your logging backend.

Cons of pattern 3

  • You’ll have no physical log file that you can explore to troubleshoot applications on the host (unless you write that as a different configuration).
  • You’ll have to maintain a new class of infrastructure with independent configuration.
  • You can still run into the issue of a “hot sender” application that can overwhelm one forwarder in the forwarder pool. (In the next pattern, I’ll show how to eliminate this concern using a load balancing layer in front of your forwarder pool.)

Example configuration for pattern 3

Your Python agent logging configuration will be nearly identical to the pattern 2, but it will be necessary to use the public IP or DNS name of the forwarder host:

handler = 
logging.DatagramHandler(‘forwarder1.host.mycompany.com’, 5160)

Pattern 4: Separately located forwarder with load balancing

As in the previous pattern, the forwarder layer is located outside the application. However, in this case, the forwarder layer is installed behind a load balancer layer. Your application code will send logs to a load balancer layer in front of the forwarder layer usually over a UDP port, and the load balancer will send the data to an appropriate instance of the forwarder based on common load balancing rules (for example, round robin). Each forwarder is configured identically to the forwarder in the previous pattern.

Fundamentally, there is nothing different about the configuration of this forwarder pool except the load balancer layer. In this case an Nginx load balancer will ensure that an application process won’t overwhelm a particular forwarder. Your applications will send logs using a round robin DNS resource associated with the Nginx UDP load balancer.

Pros of pattern 4

  • This pattern allows for massive scalability.
  • This is well suited for a multi-tenant log forwarding infrastructure.
  • You’ll get high availability of your log forwarding infrastructure.
  • Logging “spikes” from “hot senders” will be better distributed so one application that may be sending an excessively high volume of logs can’t clog the forwarding endpoint.

Cons of pattern 4

  • You’ll need to maintain a new class of infrastructure with independent configuration.

Example configuration for pattern 4

Your Python logging configuration will be nearly identical to patterns 2 and 3, but it will be necessary to use the DNS name associated with the Nginx load balancer.

handler = logging.SocketHandler(‘loglb.mycompany.com’, 5160)

Configure your load balancer as follows:

# Load balance UDP‑based DNS traffic across two servers
stream {
   upstream dns_upstreams {
       server <IP of forwarder 1 of 3>:5160;
       server <IP of forwarder 2 of 2>:5160;
       server <IP of forwarder 3 of 3>:5016;
   }
   server {
       listen 5160 udp;
       proxy_pass dns_upstreams;
       proxy_timeout 1s;
       proxy_responses 1;
       error_log logs/log-lb.log;
   }
}

Pattern 5: Log interpretation and routing

This set of patterns can be mixed in with any of the other data transfer patterns we’ve covered. For most enterprise use cases, you’ll need to enrich, filter, and appropriately route your logs. Implementations of these patterns tend to be forwarder dependent, but most of the common forwarders will support these patterns. (For more background on how log events are processed, I’d recommend Life of a Fluentd event.)

Pros

  • This set of patterns allows for massive scalability.
  • This pattern mitigates anomalous spikes in rate and log size without crashing your forwarders.

Cons

  • These patterns provide a layer of complexity, as dropping and routing log data can obfuscate the logstream.
  • As you enable various types of filters, forwarders can consume a lot of your CPU and RAM resources. Be sure to use filters in a sensible way. If you’re using more than one set of filters, always run the filters that exclude the most data first, so you don’t unnecessarily process data that’s going to be dropped anyway.

Let’s take a look at some examples of filtering, enriching, and routing.

Filtering

Select which logs to include in a log stream.

Fluentd supports the match directive for each output plugin. The match directive looks for events with matching tags and processes them.

  • Allow all
    <match **>
        … New Relic account A..
    </match>
  • Allow only records tagged as being from certain applications
    <match app.customer_info>
        ...
    </match>

One drawback to note: If there is any intermediate processing in the forwarder, each record will receive that processing overhead even if you discard that record away. You can also filter within the Fluentd event processing pipeline, which allows you to discard unnecessary records as soon as possible.

<source>
  … input configs…
  tag app-a
</source>

<source>
  … input configs…
  tag app-b
</source>

<filter app-a>
  … expensive operation only for app-a…
</filter>

 <match **>
   … output configs…
</match>

Another extremely useful pattern is to drop or exclude unwanted content.  You could use an <exclude> filter to drop certain logs from your stream, such as logs containing personally identifiable information (PII).

<filter **>
  @type grep
  <exclude>
    key message
    pattern /USERNAME/
  </exclude>
</filter>

Enriching

Add or alter content to an existing log in a stream.

Enrichment generally entails adding or updating an element to the record being processed. Fluentd provides a number of operators to do this, for example record_transformer.

# add host_param to each record.
<filter app.customer_info>
  @type record_transformer
  <record>
     host_param "#{Socket.gethostname}"
  </record>
</filter>

These elementary examples don’t do justice to the full power of tag management supported by Fluentd. See the rewrite_tag_filter documentation for some great examples of how to use the rewrite_tag_filter plugin to inject a tag into a record, which allows for great flexibility in the downstream pipeline.

Routing

Send different types of logs to different backends (or to a separate New Relic account, for example).

Fluentd supports a number of powerful routing techniques that allow you to send different events to completely different backends. Two practical examples of routing are:

  1. Send logs from one application to a specific New Relic account but send logs from all other applications to a different New Relic account.
  2. Send logs to two different outputs: a) New Relic and b) a cloud storage bucket for long term archiving.

Don’t overwhelm your system

In the modern context of SRE—where full stack observability is critical—it makes sense to devote considerable thought to how your upstream logging implementation will scale and how it can be resilient to various latencies and interruptions that may occur between data centers over complex wide area networks.

Under certain anomalous conditions, even well-behaved applications can suddenly start to emit logs at an unexpected size per log (multi-MB stack traces or object dumps) or at a rate that has been previously unforeseen (millions per minute). The patterns I’ve shown provide a substantial part of the toolkit needed to ensure those anomalies don’t overwhelm any single part of the system and cause unacceptable disruption in observability.

 

If you’re looking for more great New Relic Logs content from our experts, don’t miss How To Set Up Logs In Context For A Java Application Running In Kubernetes.

And be sure to request a demo of New Relic Logs today!

Jim Hagan is a Boston-based Enterprise Solution Consultant with New Relic. He has 20 years of experience as a software engineer, with expertise in geospatial technology and time series analytics. Before joining New Relic, he worked on highly distributed logging and metrics platforms at Wayfair. View posts by .

Interested in writing for New Relic Blog? Send us a pitch!