Moving your data is one of the trickiest parts of a cloud migration. During the migration, the location of your data can have a significant impact on the performance of your application. If you don’t migrate the data at the same time you migrate the services using the data, you risk needing to access your data over a distance between your on-premise and your cloud data centers, which can certainly cause latency and throughput issues.
Additionally, during the data transfer, keeping the data intact, in sync, and self-consistent requires either tight correlation or—worse—application downtime. The former can be technically challenging for your teams performing the migration, but the latter can be unacceptable to your overall business.
Moving your data and the applications that utilize the data at the same time is necessary to keep your application performance acceptable. Deciding how and when to migrate your data relative to your services, though, is a complex question. Often companies will rely on the expertise of a migration architect, which is a role that can greatly contribute to the success of any cloud migration.
Whether you have an on-staff cloud architect or not, there are three primary strategies for migrating application data to the cloud:
- Offline copy migration
- Master/read replica switch migration
- Master/master migration
It doesn’t matter if you’re migrating an SQL database, a noSQL database, or simply raw data files—each migration method requires a different amount of effort, has a different impact on your application’s availability, and presents a different risk profile for your business. As you’ll see, the three strategies are fairly similar, but the differences are in the details.
Strategy 1: offline copy migration
An offline copy migration is the most straightforward method. Bring down your on-premise application, copy the data from your on-premise database to the new cloud database, then bring your application back online in the cloud.
An offline copy migration is simple, easy, and safe, but you’ll have to take your application offline to execute it. If your dataset is extremely large, your application may be offline for a significant period of time, which will undoubtedly impact your customers and business.
For most applications, the amount of downtime required for an offline copy migration is generally unacceptable. But if your business can tolerate some downtime, and your dataset is small enough, you should consider this method. It’s the easiest, least expensive, and least risky method of migrating your data to the cloud.
Strategy 2: master/read replica switch migration
The goal of a master/read replica switch migration is to reduce application downtime without significantly complicating the data migration itself.
For this type of migration, you start with your master version of your database running in your on-premise data center. You then set up a read replica copy of your database in the cloud with one way synchronization of data from your on-premise master to your read replica. At this point, you still make all data updates and changes to the on-premise master, and the master synchronizes those changes with the cloud-based read replica. The master-replica model is common in most database systems.
You’ll continue to perform data writes to the on-premise master, even after you’ve gotten your application migrated and operational in the cloud. At some predetermined point in time, you’ll “switchover” and swap the master/read replica roles. The cloud database becomes the master and the on-premise database becomes the read replica. You simultaneously move all write access from your on-premise database to your cloud database.
You’ll need a short period of downtime during the switchover, but the downtime is significantly less than what’s required using the offline copy method.
However, downtime is downtime, so you need to assess exactly what your business can handle.
Strategy 3: master/master migration
This is the most complicated of the three data migration strategies and has the greatest potential for risk. However, if you implement it correctly, you can accomplish a data migration without any application downtime whatsoever.
In this method, you create a duplicate of your on-premise database master in the cloud and set up bi-directional synchronization between the two masters, synchronizing all data from on-premise to the cloud, and from the cloud to on-prem. Basically, you’re left with a typical multi-master database configuration.
After you set up both databases, you can read and write data from either the on-premise database or the cloud database, and both will remain in sync. This will allow you to move your applications and services independently, on your own schedule, without needing to worry about your data.
In fact, to better control your migration, you can run instances of your application both on-premise and in the cloud, and move your application’s traffic to the cloud without any downtime. If a problem arises, you can roll back your migration and redirect traffic to the on-premise version of your database while you troubleshoot the issue.
At the completion of your migration, simply turn off your on-premise master and use your cloud master as your database.
It’s important to note, however, that this method is not without complexity. Setting up a multi-master database is quite complicated and comes with the risk of skewed data and other untimely results. For example, what happens if you try and update the same data simultaneously in both masters? Or what if you try to read data from one master before an update to the other master has synchronized the data?
As such, this model only works if your application’s data access patterns and data management strategies can support it. You’ll also need application specific synchronization and sync resolution routines to handle sync-related issues as they arise.
If your application, data, and business can handle this migration method, consider yourself fortunate and use it. It’s the cleanest and easiest of the three strategies.
Mitigate migration risks
Any data migration comes with some risk, especially the risk of data corruption. Your data is most at risk while the migration is in progress; swift and determined execution of the migration is critical. Don’t stop a migration until you have completed the process or you have rolled it back completely. And never stop a migration halfway through—half-migrated data isn’t useful to anyone.
Risk of data corruption is especially high when migrating extremely large datasets. Offline data copy and transfer tools such as AWS Snowball can help manage the migration of large quantities of data, but they do nothing to help with your application’s data usage patterns during a migration. Even if you use a transfer device such as Snowball, you’ll still need to use one of the migration strategies described above.
As is true with all migrations, you won’t know if you encounter a problem if you can’t see how your application is performing before, during, and after the migration. Maintaining application availability, and keeping your data safe and secure, can only happen if you understand how your application is responding to the various steps in the migration process.
As such, monitoring your application during all aspects of the migration with the New Relic platform, including New Relic APM and New Relic Infrastructure, will help keep your application safe and secure, and your data corruption-free. This is critical for all aspects of your migration, not just your data migration.