How do you move more than 6000 customers in 2 hours? (Hint: plan ahead)

By Posted in New Relic News 26 January 2011

New RelicAs many of you know, we undertook a major data center move last November. The move went quite smoothly, and we thought a blog post or two about how we did it might be useful to the community. We cajoled our IT Director Bayard Carlin into writing two posts that discuss the planning and execution of our effort to move live production performance data for 6000+ customers. This post covers the efforts that went into planning the move and the second post will detail events as they unfolded the night of the move. Without further ado…

Egads, I wish I could sleep but how am I going to move the data? (Part 1)

by Bayard Carlin

First, a huge thanks to those involved

The move was a pretty complex project with a lot of planning, evaluation and testing. We did a lot of rehearsing before the actual cutover. Many thanks to the folks in our engineering department who contributed to the project. We also also want to thank the folks at Percona as well as Tyler Poland at Engine Yard. They played a big part in our success.

The challenge

One of the most difficult problems to solve was how to move several terabytes of hot data across the country without significant downtime or gaps in data collection. This was keeping me up at night. The official company target was a 6 hour maintenance window. I wanted to be collecting customer data within one hour. We almost made it–I think the actual Collector outage was about one hour and 10 minutes. It took us about 2 hours to get the UI up.

We had two viable options to move the data across: move customers programmatically a batch at a time or use MySQL replication. Some of us were a bit skeptical that we could replicate across the Internet because of the amount of rows we write (1.4 billion a day). Moving the customer data programmatically was pretty complex and fraught with its own issues.

I am an Ops guy, I understand replication, I have used it before and I trust it. Replication would be cleaner as we could do a knife-switch cutover, so it was my preferred method.

The general steps for replication are: Get an image of the database, start a mysql instance from that image and enable replication. Most times mysqldump/restore is used to get the image.

A bit about the New Relic database architecture

New Relic uses MySQL 5.0. We have sharded database architecture for our metric data and a central database for all the account data. Customers are mapped to specific shards, which are around 500 GB each. As mentioned above, we insert around 1.4 billion rows a day across all our shards. We have a multi-tenant application and each customer gets their own set of tables. We have *lots* of tables, at least 120,000 per shard.

The tables that contain data for the last hour are getting written to. These tables are extremely hot and tend to be large. Every hour we roll them over, create a new set and drop the tables that are no longer necessary.  The tables containing historical data are static unless we are processing them.  Every night we also purge tables that have expired.

These factors make it very difficult to use normal methodologies to back up and restore the databases. We delete tables every hour and mysqldump takes greater than an hour to run, plus mysqldump will fail if tables are dropped during a backup.

Another issue that prevents us from doing a normal dump is that the metric tables that are getting written to are way too hot to lock. I tried Xtrabackup but it failed because we were dropping tables. We have a way to back up our shards but that is out of scope for this posting. Essentially we don’t back up the hot tables and have a wrapper that keeps mysqldump from borking when tables are dropped. Our backups unfortunately are not suitable for replication.

Replicating the accounts DB was a no brainer. I used Xtrabackup to dump the DB, copied the file across and executed xtrabackup with the –prepare option. The Xtrabackup output gave me the correct binlog and log position and I was off to the races. Xtrabackup is orders of magnitude faster than mysqldump. You can find information on Xtrabackup here.

Proof of concept

Our first step was a proof of concept to see if replication was actually possible. We decided to test things out with one shard.

In most cases mysqldump/restore is used for replication but as explained above this would not work for us. In the end we decided that LVM snapshots were the best option and we would rsync the data across to the new data center. If you do a snapshot of a running database you need do a flush tables with read lock before you snap or you have to dump/restore prior to replication. For a variety of reasons the flush tables may have been problematic with our app and we didn’t want to risk it. I also didn’t want to take the time for a dump and restore so I stopped the DBs prior to the snapshot. Having a sharded architecture made it possible for us to do this without affecting all our customers at once. The outages were generally sub 5 minutes per shard, the first snaps (ahem) took longer because we hit some snags.

Here is an outline of the steps we used. This is not meant to be a complete tutorial on replication. I would strongly recommend reading up on replication or use a consulting service like Percona to help you out. Having a data center migration go sideways would not be career enhancing. Docs on replication can be found here.

Some prereqs

Set up ssh keys on the master and replica hosts. This is necessary for passwordless rsync and since we are replicating over the Internet we need an ssh tunnel pass data back and forth securely.

Make sure the MySQL root passwords match on the master and slave (this is done for you if you rsync the entire MySQL directory across)

Edit the my.cnf on the master to enable binlogging and set a unique server ID. If binlogging is already enabled and you are not replicating to another DB you will want to delete all the binlogs after shutting mysql down and prior to doing the snapshot.

We do not use localhost or bind our mysql instances to the server’s IP address, we have a unique IP per instance. The first thing we needed to do is add the correct permissions for root so the remote DB can be administered.

On the master in the mysql client
#grant all on *.* to 'root'@'<hostname_of_replica>' identified by '<somepassword>' with grant option; flush privileges;

Then stop mysql
#/etc/init.d/mysql stop

Do the snap
#lvcreate -L <size> -s /dev/<original-device> -n <new-device>

Mount it
# mount -o nouuid,ro /dev/<new-device> /mnt/backup

Check the snapshot. You will want to have an idea of whether the correct files are there so you need to do a compare. Since MySQL is down we wanted to make it quick.
#rsync -avz --dry-run <source_directory> /mnt/backup

If there is no output you are golden.

Start mysql again
#/etc/init.d/mysql start

Now do the rsync. We used screen as the rsyncs took a long time. We also wanted to have notifications when the syncs were finished. I also wanted to look at how things were going so I logged the screen output.
# screen -S rsync -L
#rsync -avz s/mnt/backup <destination_host>:<destination_directory> && echo "Snapshot copy done" | mail -s "`hostname` snapshot copy is done" <your_email@your_domain.com>

If bandwidth or disk IO is an issue you can use the --bwlimit flag.

Now it is time to eat dinner, go to a movie and get a good night’s sleep. With 500 GB and 120,000 files over the Internet this is going to take awhile.

Next we want to allow the slave to replicate from the master. On the master in the mysql client:
#grant replication client, replication slave on *.* to <'replication_user>'@'<hostname_of_replica>' identified by '<replication_password>';

When the rsync is finished log into the slave and execute:
/etc/init.d/mysql start && tail -f <path_to_logs>/mysql.err

Assuming there are no errors in the log you are ready to start replication. Set up the bidirectional SSH tunnel. We set up the tunnel on the remote server and decides to use port 13306 on both ends. It is a good thing to script this into a loop
ssh -N -v -L13306:127.0.0.1:3306 -R13306:127.0.0.1:3306 <remote_host>

On the slave in the MySQL client: (You need to use the same <replication_user> and <replication_password> as you did with the master.)
#change master to master_host=<hostname_of_master>', master_port=13306, master_user='<replication_user>', master_password='<replication_password>', master_log_file='<name_of_first_binlog>', master_log_pos= <correct_log_position>;
#start slave;

Also, you will want to make sure nothing is writing data to the slave or replication will get borked. Yes, I speak from experience.
#set global read_only = on;

Now you can see how replication is doing:
#show slave status \G;

For an exact explanation of the output please see MySQL documentation here.

The initial test went well. It took about 25 hours to sync all the data across. Having a large number of small files did not help us any. I brought up the new instance and enabled replication without issue. I think it took about 8 hours for all the changes to catch up.

Rinse, lather, repeat

We were very encouraged with the results. We kept replication going for a couple of weeks without issue. Replication for the most part kept current and when it fell behind it caught up rather quickly. Once we proved this method would work we set it up on the rest of our shards. We did 2 a night. We kept replication running for a week or so prior to the cutover to make sure everything was stable.

Nic Benders one of our Dev/Ops guys cooked up a script that would check replication status across a number of databases. A slightly modified (to protect the innocent) version is at the bottom of the post. We later wrapped this into a ruby script that sent alerts if seconds > 0 for a couple of iterations and put it into a cron job. We wanted to know about replication errors as soon as possible because redoing everything would slip the date.

The steps for the cutover were all scripted but the process looked like this::

Shut down the app.

Make sure replication was up to date on all the shards:
# show slave status \G

Shut down the master.
#/etc/init.d/mysql stop

On the accounts database and each of the slaves:

Turn off replication.
# stop slave;

Turn off mysql and do a snapshot just in case we need to fail back. If we have an image we can start replication right where it left off. Karsten Self another Dev/Ops guy scripted this with a bunch of trimmings as we had some unique issues we needed to solve.
/etc/init.d/mysql stop
#lvcreate -L <size> -s /dev/<original-device> -n <new-device>
# mount -o nouuid,ro /dev/<new-device> /mnt/backup
#rsync -avz --dry-run <source_directory> /mnt/backup
.
Make the DB read/write.
# set global read_only = off;

Make the DB it’s own master.
# change master to master_host=' ',

Tighten up security a bit by removing the permissions for the old master:
#revoke all on *.* from 'root'@'<hostname_of_old_master>'; flush privileges;

Now the data was production ready on all the new servers.

Be sure to stay tuned for the next exciting episode where we discuss creating the playbook and the actual execution of the switchover.

Voila!

The next steps were bringing up the different tiers of the application.

All in all I was delighted with the results.  The cutover was smooth easy and efficient.

Nic’s check replication across multiple hosts script:
#!/bin/bash

for db in A B C; do
case "$db" in
"A")
hostname="<hostname A>"
;;
"B")
hostname="<hostname B>"
;;
"C")
hostname="<hostname C >"
;;
esac
printf "%-16s " $hostname
status=''
status=`mysql -h $hostname -u <replication_user> --password='<replication_password>' mysql -e 'show slave status' -E 2>/dev/null | grep Seconds_Behind_Master | awk '{sub(/^[ \t]+/, "")};1'`
if [ "$?" == "0" ]; then
echo -en "$status"
seconds=0
seconds=`echo $status | awk '{print $2}'`
if [ -z $seconds ]; then
echo
elif [ "$seconds" == "NULL" ]; then
echo -e "\t--:-- (hh:mm)"
elif [ $seconds -lt 60 ]; then
echo -e "\tCURRENT"
elif [ $seconds -gt 0 ]; then
minutes=$(($seconds / 60))
hours=$(($seconds / 3600))
minutes=$(($minutes - $(($hours * 60))))
printf "\t%d:%02d (hh:mm)\n" $hours $minutes
else
echo -e "\t???"
fi
else
echo
fi

echo
done

About the author

marketing@newrelic.com'

Tell us your thoughts Or Send us an internal high five

Talk to @newrelic