(This post is part two of a two-part series. Part one—Containerizing Databases at New Relic: What We Learned—looks at four inherent challenges of putting databases in Docker containers—and how New Relic addressed them.)
The tooling required to manage a large number of small databases is different than the tools needed to manage a few large databases. At New Relic, we’ve constructed a management framework for small- and medium-sized databases that we call Megabase. Megabase consists of microservices and tools, written in Go, that enable us to deploy replicated groups of containerized database instances to multi-tenant hosts. Megabase currently supports recent versions of MySQL (via Percona) and PostgreSQL. We’re in the process of adding support for Redis.
Our goals for Megabase were to deploy resource-efficient databases in way that is fast, consistent, and repeatable. When considering whether to adopt containers for databases, we had concerns about configuration, durability, and performance. In this post, I’ll explain how we addressed those concerns and detail Megabase’s architecture and redundancy model and its configuration and deployment scenarios. I’ll also share a few examples of how we monitor Megabase to ensure we honor our Service Level Agreement (SLA) with internal customers.
Megabase architecture and redundancy
Megabase is built in a series of clusters, and each cluster consists of three hosts deployed across three availability zones. An instance of the Megabase API is deployed on each host. The API handles requests to run, start, and stop containers. The API also handles pre- and post-run tasks. One example of a pre-run task is the creation of Logical Volume Manager (LVM) volumes for the container. I’ll cover our use of LVM with Docker in more detail below.
Database containers are provisioned in replica sets with three instances, which we call “triads.”
In the event of a failure at the container or host level, or in an availability zone, we can failover to one of the replica instances by reconfiguring a pool of available instances from a cluster of proxy nodes. The Megabase API has endpoints to facilitate any database-specific actions necessary for failovers. For example, if we need to fail over a PostgreSQL database, we call an API endpoint to stop the existing primary instance and rebuild it as a replica. We then call another endpoint to promote one of the replica instances.
The diagram below shows the traffic flow before a failover (blue) and after a failover (orange):
Megabase configuration and deployment
Each database in the triad of instances gets a configuration file and any optional SQL files needed to create users or populate schemas.
The only prerequisite for deploying to Megabase is the creation of database configuration files. There is a simple three-step process behind the injection of configuration into containers:
- We push the database configuration files to our version control repository (GitHub). We include any required database configuration files and any optional SQL files if we need to create users or populate a schema in a particular database.
- A webhook watches GitHub and sends the configuration files to a bucket in our object store.
- We pull the container images down from the Docker Image Registry and Rclone syncs the configuration files to each container.
To deploy a triad, we run the following command from our client:
mb run -e <environment> -t <cluster> --conf <config file path>
A Megabase deploy configuration file is formatted in YAML and looks like this:
--- global: envs: POLLING_INTERVAL: 900 containers: newdb_production: image: "container-registry.domain.com/db-team/mb-postgres-96-alpine:1.32" cpus: 4 memory: "16384m" disk: "250G"
And this diagram shows the entire deployment workflow:
The Megabase API also collects output from the
docker inspect command, so we can use our command line tool to inspect containers across all of our clusters in a given environment. This is useful when verifying the state of triads after deploying them, or when performing operational tasks such as adjusting container resource configuration and capacity planning.
Some operational concerns
Megabase comes with a number of operational challenges that we’ve had to resolve.
When we first attempted to run containerized databases under a proper scheduler, our scheduler would “ungracefully” kill database instances.
Originally, we used the
docker stop command to send a SIGTERM signal to a running container, asking it to terminate. If after a period of time it hadn’t terminated, docker stop sent a SIGKILL signal that forced termination. In this scenario, we didn’t know how long it would take a database to cleanly shut down, release memory, and flush its data.
Megabase instead uses a wrapper script to stop databases by sending the appropriate signal to the database inside the container. For example, PostgreSQL will prefer either SIGINT (a fast shutdown) or SIGQUIT (an immediate shutdown) depending on what shutdown behavior we require: close connections gracefully, force close connections, or shut down connections immediately (typically only in emergency procedures).
We occasionally need to update the configuration of a database. If we need to change a parameter and can set it dynamically, we first verify the change in version control before pushing it to production. It’s critical that we prevent service interruptions, so if a parameter change requires a restart or a SIGHUP signal, we use our command line tools to restart each instance and failover each database one at a time.
If we need to resize the volume available to a container, we leave the container online and simply extend the underlying LVM volume and resize the filesystem.
We can also dynamically resize the CPU shares and memory limits for containers with the
docker update command.
Each container image has a script that checks for existing data on the underlying volume when the container starts. If the script finds no data, it pulls down a consistent snapshot from another running instance to bootstrap replication.
If we restart a container, the data in the database will persist, but not if the host fails. To ensure we don’t lose any data, we make a backup of every database and send them offsite at least once a day. We back up databases with high-value data much more frequently. We have been evaluating container images with continuous archiving tools such as WAL-E. If a container fails, we can restore a copy of the data to a local, known location, and the container will find it on startup and proceed as usual.
As mentioned in part one of this series, we use CPU shares, memory limits, and Logical Volume Manager (LVM) volumes to isolate container resources as much as possible. We take care to place high-throughput instances on dedicated host clusters to avoid contention for network and storage resources. And we keep a close eye on upstream features coming from Docker, should it release anything that could help mitigate these concerns.
CPU pinning is another method for isolating workloads on multi-tenant servers. Because pinning containers to a set of CPUs would require more host-level resource management and sophisticated scheduling than was in scope for Megabase, we decided against this method. We may revisit that decision at a later point when we add support for more performance-sensitive databases.
We manage the persistent storage available to containers with LVM. For the data that lives in the
/var/lib/docker/ directory, we create a large thin pool of free space on each Megabase host that a container volume can draw from as needed.
There are a couple of important things to remember when using Docker
devicemapper storage driver in direct-lvm mode. First, it should be configured only on a new host with no other running containers. Second, it does not allow you to use multiple block devices. Instead we use Ansible to automate the configuration of direct-lvm mode. We store database files on a separate LVM volume (e.g.,
/data/newdb_production/) that we bind mount inside each container at deploy time.
Observability and monitoring for Megabase
Megabase hosts run the New Relic Infrastructure agent to collect the most vital server metrics. Our deployments of the Infrastructure agent include on-host integrations to collect data directly from our databases as well as the underlying hardware. We also collect additional metrics from MySQL and PostgreSQL directly and send them as custom events to the New Relic Insights API.
We use both Infrastructure and Insights metric-based alerts to notify our team about alarming disk use, abnormally high I/O wait times, unexpected restarts, or single- and multi-bit memory errors. Our team leverages the full power of NRQL to craft custom dashboards and alerts for database availability, throughput, replication lag, and other Service Level Indicators (SLIs).
One of our dashboards uses our custom events for database connection and query failures over time and distills it into a single availability value for our users (as shown in the following image). These numbers help our teams understand reliability and risk at a higher level and should ultimately allow us to provide a higher level of reliability for New Relic customers.Another recent set of Insights dashboards we’ve created lets us view resource usage by cluster, host, and container to help us proactively spot “noisy neighbor” problems. Since we’ve not yet built out automated host resource management or container scheduling, monitoring for noisy neighbors is critical to our ability to provide consistent performance Service Level Objectives (SLOs) for Megabase users.
Next steps for improving Megabase
Megabase helps us deploy resource-efficient databases faster and more consistently. As I mentioned in part one of this series, we have deliberately avoided the added complexity of container scheduling and orchestration in favor of a small set of tools to aid with deployment and lifecycle management.
It is likely that we’ll adopt an orchestration solution once there is enough support in open-source projects (such as Kubernetes and DC/OS) that meet our needs. In the meantime, we plan on continuing to add support for other data stores and harden our existing tools by continuing to use them in production.
We’re also working to improve how we handle backups and failovers. We believe that having strong stories for backups, restoration, and failovers allows us to provide high levels of availability and durability for our database users, giving them confidence during any failure scenarios they may encounter.
While our database team now speaks a common language when it comes to deployment, this is an ongoing project that requires continual iteration and development. That’s the best part our jobs—we move forward.
(Don’t miss part one of this series: Containerizing Databases at New Relic: What We Learned.)