In the summer of 2018, it’s clear that the Kubernetes container orchestration platform has become a core part of the technology strategy for many New Relic customers, regardless of the industry they’re in or the digital experience they’re delivering to their customers.
This was not always the case. Originally developed by Google, Kubernetes is an offshoot of the Borg project, which was Google’s internal container-oriented cluster-management system started in 2003. Fast forward 15 years, and Kubernetes is now one of the fastest-growing open source projects in history. What’s more, there are no signs of regression—all of the major modern infrastructure platforms have now embraced Kubernetes, from Docker and Red Hat to Amazon Web Services (AWS), Google Cloud Platform, and Microsoft Azure.
This rapid growth and platform diversity has given companies more flexibility to run Kubernetes how and where they want, based on the needs of their company. Knowing there are so many ways to run and deploy Kubernetes, we wanted to look at the data to get a better understanding of exactly how teams are using it.
How New Relic customers are using our Kubernetes integration
To understand what a “normal” cluster might look like, we looked at how our customers are using New Relic’s out-of-the-box Kubernetes integration. And while these are real-world numbers, they may not be extensible to all Kubernetes users; nevertheless, they represent a key data point in an area without a lot of hard data.
Here’s what we found:
|New Relic Customers’ Kubernetes Usage||Average|
|Nodes per cluster||9|
|Pods per cluster||161|
|Containers per cluster||183|
|Pods per node||18|
|Containers per node||21|
|Containers per pod||1|
Looking at this data, it’s probably safe to say that there is no such thing as a “normal” cluster. Our customers are running Kubernetes in varied ways based on their specific needs and business objectives. However, the data does suggest some interesting conclusions about Kubernetes deployments, pods, and containers—and about the cluster itself. Let’s dive into the numbers to figure out what it all means.
- The average New Relic customer has 2 Kubernetes clusters.
One common reason to have two clusters is to dedicate one to your production applications, and another to staging and development. Using a separate cluster is a best practice—especially for organizations in early stages of Kubernetes adoption—because it allows software organizations to test and experiment with cluster configuration changes without affecting production and risking downtime.
However, as customers get more confident with Kubernetes, we anticipate that they will begin to mix staging and production environments. Rather than separating by cluster, we believe organizations will begin to use namespaces as a means of separation. For example, more advanced Kubernetes users will likely adopt a setup similar to the following:
- Integration → cluster 1 – namespace 1
- User Acceptance Testing → cluster 1 – namespace 2
- Staging → cluster 2 – namespace 1
- Production → cluster 2 – namespace 2
While Namespaces do not necessarily provide a stronger separation than clusters, they do provide a more flexible way to allocate the resources since you can change the quotas for namespaces whereas to grow or shrink a cluster you have to add or remove nodes. This flexibility will ultimately let you control the resources allocation more efficiently.
Another possible reason to have multiple clusters is so you can assign them to specific regions. Having a cluster in each region provides better availability and lower latency. And if something goes wrong in one region, another region can provide backup. Distributed regions in Kubernetes environments allow you to provide lower latency and a better experience for your customers, at minimal cost.
- The average customer has 54 deployments running at any given time (some customers run up to 1,000 simultaneous deployments).
- The average customer has 6.5 incomplete deployments.
The relatively high number of deployments indicates that our customers are actively breaking apart their monolithic environments and moving towards microservices architectures. In a traditional monolith approach, we would see a total number of services in the single digits, so seeing customers with more than a thousand deployments, shows a clear transition toward microservices.
The most obvious implication of this is that, unlike in traditional monolithic environments, it is impossible to know all of the details of your deployments. In a monolith you can easily track all of the details of your deployments; in a Kubernetes environment you’re likely not aware of what the last deployment was, or if it was completed.
Incomplete deployments can occur because of pending or failed pods; it’s critical that you monitor for pending or failed pods to ensure a healthy Kubernetes environment. This means you need visibility into your deployments to see the status and make sure everything is functioning as it should be.
Pods and containers
- The average customer has 343 pods and 396 containers.
This data shows that most of our customers are using a “one-container-per-pod” approach, the most common Kubernetes use case. Think of a pod as a wrapper around a single container, and Kubernetes manages the pods rather than the containers directly.
Alternatively, pods can run multiple containers that need to work together. A pod might encapsulate an application composed of multiple co-located containers that are tightly coupled and need to share resources. These co-located containers might form a single cohesive unit of service—one container serving files from a shared volume to the public, while a separate “sidecar” container refreshes or updates those files. The pod wraps these containers and storage resources together as a single manageable entity.
Another takeaway is that customers are running a lot of pods, making it very difficult to keep track of them individually. Additionally, the large number of pods and containers creates more places where things can fail. Given the complex nature of Kubernetes, it can be difficult to understand why these failures occur.
94% of customers have containers restarting from failures. At any given time, customer have an average of 12 containers that have issues related to failure restarts.
An isolated container restart is not abnormal, but a restart could indicate a larger problem. In normal conditions, container restarts should not happen; a restart of a container indicates an issue either with the container itself or the underlying host. Because of Kubernetes’ scheduling mechanism, containers may be killed or restarted when they hit their memory limit, leading to a temporary disruption in service availability. For example, if a container hits 100% of its memory limit and is restarted, it may be at 30% of its memory limit the next time you check it—this can make it difficult to troubleshoot resource issues in Kubernetes.
A large number of failures and restarts are a good reason to separate staging and production environments. Kubernetes works on the same idea as infrastructure as code: its configuration, as well as the configuration of the pod encapsulating your application, is managed as code. Any bugs in the configuration can propagate to the underlying infrastructure. This is not ideal in production environments because it could quickly affect the customer experience you’re delivering.
Some predictions about Kubernetes usage
Reviewing the Kubernetes environments of our customers offers valuable data points across a wide range of uses and industries. With these numbers, we can make a few predictions:
We’ll see increased Kubernetes adoption and larger workloads.
With increased adoption comes increased workloads. Customers are not only migrating their monoliths to Kubernetes, but also creating entirely new applications on top of Kubernetes. As a result, we can expect average deployment size to increase over the coming months.
Kubernetes adoption will enable cloud adoption.
Universal support for Kubernetes from the cloud giants suggests there is something in it for them. As more customers begin to deliver their applications in containers instead of in virtual machines, they’ll be able to run their applications in more cloud providers, ensuring that high-availability versions of the applications are easy to build. Additionally, customers will have more opportunities to migrate software that had been stuck on-premise to the cloud.
Kubernetes could also be just the thing for customers crafting a hybrid multi-cloud approach. The portability of the Kubernetes API to all multiple environments will serve to standardize the tools customers use, increase productivity, and reduce friction separating applications across different cloud providers. We’re hearing customers becoming more concerned with cloud provider lock-in. Portability provided by Kubernetes offers a promising solution.
Complexity associated with Kubernetes will be mitigated by the rise of Kubernetes-supported platforms.
Our customers are leveraging managed services such as Google Kubernetes Engine (GKE), Azure Kubernetes Service (AKS) and Amazon Elastic Container Service (EKS). One advantage: Not having to install or manage your own Kubernetes clusters is a huge time saver.
But not all of these platforms are public cloud-based. Kubernetes and containers have limited connections to service-to-service communication from a routing, security, or discovery perspective. Emerging open source projects such as Istio (backed by major players including Lyft, IBM, and Google) handle many service-to-service communication functions by integrating them into the network in a language-agnostic way. This service mesh allows developers to focus on business logic and worry less about where the application itself is running. We’re hearing more interest in Istio and anticipate more adoption of it across our customer base.
Will these predictions come to pass? Check back with us in six months or so for an updated look at how New relic customers are using Kubernetes.