Apache Kafka is a popular distributed streaming platform that thousands of companies like New Relic, Uber, and Square use to build scalable, high-throughput, and reliable real-time streaming systems. But managing such a platform is no easy feat. Amazon Managed Streaming for Apache Kafka (MSK) abstracts away the management of Kafka so you don’t have to worry about maintaining your own data streaming pipeline.

Amazon MSK exposes metrics in a Prometheus compatible format. And since the New Relic Prometheus OpenMetrics integration collects metrics from any endpoint compatible with Prometheus, you can send MSK metrics to the New Relic One platform.

NEW RELIC PROMETHEUS OPENMETRICS INTEGRATION
prometheus logo

By collecting Amazon MSK metrics in New Relic One, you’ll be able to combine that data with agent-based APM and Infrastructure data; log data from your applications and hosts; and other third-party telemetry data like distributed traces to create an entity-centric system of record. You can then use this combined data to build dashboard charts and set alerts, with the aim of creating observability within your entire application stack.

In this post, we’ll explain how to collect and use Amazon MSK metrics in New Relic.

Step 1: set up an Amazon MSK cluster

To set up a new Amazon MSK cluster, follow the steps in the Amazon MSK getting started guide.

You can use the following definition file (in JSON) when setting up your cluster:

clusterinfo.json



{

  "BrokerNodeGroupInfo": {

  "InstanceType": "kafka.m5.large",

  "ClientSubnets": [

    "subnet-1",

    "subnet-2", 

    "subnet-3"

  ],

  "SecurityGroups": [

    "sg-1"

  ]

},

  "EncryptionInfo": {

    "EncryptionInTransit": {

      "InCluster": false,

      "ClientBroker": "PLAINTEXT"

    }

  },

  "ClusterName": "PrometheusTest",

  "EnhancedMonitoring": "PER_TOPIC_PER_BROKER",

  "KafkaVersion": "2.2.1",

  "NumberOfBrokerNodes": 3,

  "OpenMonitoring": {

    "Prometheus": {

      "JmxExporter": {

        "EnabledInBroker":true

     },

     "NodeExporter": {

       "EnabledInBroker": true

     }

    }

  }

}
  1. Make sure the nodes in your security group have the rules to access Prometheus metrics on ports 11001 and 11002. (For details on managing security groups, refer to the AWS documentation.)
  2. Next, discover the DNS name of your Kafka node:
    aws kafka list-nodes --cluster-arn "arn:<cluster ARN>"

    The result should be similar to:

    "NodeInfoList": [
    
        {
    
            "AddedToClusterTime": "2019-11-28T07:28:30.421Z",
    
            "BrokerNodeInfo": {
    
                "AttachedENIId": "eni-XXX",
    
                "BrokerId": "2",
    
                "ClientSubnet": "subnet-2",
    
                "ClientVpcIpAddress": "172.31.1.2",
    
                "CurrentBrokerSoftwareInfo": {
    
                    "KafkaVersion": "2.2.1"
    
                },
    
                "Endpoints": [
    
                    "b-2.prometheustest.XXXX.kafka.us-east-2.amazonaws.com"
    
                ]
    
             },
    
    
    
             "InstanceType": "m5.large",
    
             "NodeARN": "arn:aws:kafka:us-east-2:XXX",
    
             "NodeType": "BROKER"
    
    
    
         },
  3. Choose the node that is in your subnet, and document this information. You’ll need it to configure New Relic Prometheus OpenMetrics integration.
  4. Confirm you can access the Prometheus endpoints:
    curl <BrokerDNS>:11001/metrics
    
    
    
    curl <BrokerDNS>:11002/metrics

Step 2: set up the New Relic Prometheus OpenMetrics integration

If you haven’t already done so, create an EC2 instance in the same Amazon virtual private cloud.

Next, deploy the New Relic Prometheus OpenMetrics integration:

  1. Create a configuration file (config.yaml), or use our example configuration file.

    Important: Be sure to change the cluster_name setting.

  2. Configure the Amazon MSK endpoints targets in the configuration file:
    targets:
    
    - description: MSK
    
      urls: ["http://<BrokerDNS>:11001/metrics", "http://<BrokerDNS>:11002/metrics"]
    
    
  3. Add the http://localhost:8080/metrics endpoint to collect metrics about the integration itself.
  4. Start the integration with the following command:
    docker run -d --restart unless-stopped \
    
        --name nri-prometheus \
    
        -e LICENSE_KEY="YOUR_LICENSE_KEY" \
    
        -v "$(pwd)/config.yaml:/config.yaml" \
    
        newrelic/nri-prometheus:1.2

    Important: Replace your YOUR_LICENSE_KEY with your  New Relic license key (required).

  5. Confirm the container is running properly:
    docker ps -f "name=nri-prometheus"

For more details about configuration options and using the Prometheus OpenMetrics integration, see the New Relic documentation.

Using your Amazon MSK metrics data in New Relic

After you get the integration running, it will immediately start sending Amazon MSK metrics to New Relic. The following examples show how to use these metrics.

Example 1: monitor the filesystem on Kafka nodes

Kafka’s offset retention policy can cause the disks on your nodes to fill up. Monitor the disk usage to ensure your nodes are healthy.

In New Relic One chart builder, select the metric node_filesystem_avail_bytes, and in the Facet by field, select the device name to see file system usage per device in your cluster.


New Relic One dashboard chart that shows filesystem usage in an Amazon MSK cluster.

Example 2: monitor the producer request rate

To ensure a healthy pipeline, you can track the average number of producer requests sent per second.

In chart builder, select the metric kafka_server_BrokerTopicMetrics_Count and filter (narrow to) the name TotalProduceRequestsPerSec.

Example 3: alert on Kafka node storage

Use New Relic Alerts to create an alert condition to ensure your Amazon MSK nodes don’t violate critical storage thresholds.


Setting an alert condition for node storage in an Amazon MSK cluster.

See the Amazon MSK documentation for a full list of metrics you can collect from your cluster.