When Pantheon migrated its platform to the cloud in 2017, the Drupal and WordPress website management service moved some 200,000 customer websites to the Google Cloud Platform (GCP) in just two weeks. Not only did the migration go completely unnoticed by customers, Pantheon also saw a 40% drop in its cloud services provider costs.
According to Josh Koenig, Pantheon co-founder and CTO, “New Relic was instrumental in our ability to successfully complete the migration process.” The firm has long relied on New Relic’s data to help understand how its customers’ websites are running, and includes New Relic APM in every site deployment. New Relic became even more critical during the process of quickly moving tens of thousands of high-traffic, frequently updated customer websites from the firm’s existing provider to GCP without negatively affecting any of those customers.
Migrating with confidence
“I was actually amazed,” Koenig says. “I expected it to be much, much harder, and there to be many more problems.” Koenig cites three months of extensive planning, proof-of-concepting, and research that allowed Pantheon to be confident that the sites would work on GCP—and that they would perform as well or better than they did before the migration. Explains Koenig, “Before we moved a single workload over to GCP, our engineering team mapped out a detailed plan for how we would successfully migrate, including establishing baseline metrics and acceptance criteria.”
“Where things get scary for us is understanding the behavior and performance of our customer applications, because if those get degraded, we lose customers. And it’s virtually impossible to control for all the different combinations of code and plug-ins and modules and use cases” of all of Pantheon’s customers. While GCP offers a newer infrastructure than Pantheon previously used, Koenig was worried about the potential effects of various differences—for instance, the impact of network attached storage on database latency.
Only because of New Relic
“What we could do only because of New Relic,” he says, was start a process to spot check the performance of various elements, from PHP sites to databases, and begin to understand where to expect performance impacts from the migration on baseline metrics including response time, error rates, and throughput.
“We took some of our more demanding use cases and load tested them, and we actually started to find some tuning aspects that we needed to fix. We then moved a small group of pilot customers to GCP and let them run there for two weeks so we could observe under live traffic, which is never exactly the same as a load test. New Relic data was essential in that context because it gave us a really easy way to get a rich data set in a test,” Koenig says. To have data that was consistently sourced and formatted—a real apples-to-apples comparison—“was key in building the confidence necessary to attempt the migration at the pace that we ended up doing it,” Koenig notes.
Keeping it real
“When I say we did all that in two weeks,” Koenig allows, “it wasn’t two weeks of steady throughput. There were a couple of points where we paused for 24 hours to fix something. But the precision we were able to get from New Relic enabled us to ensure that our customers had uninterrupted, high-quality service throughout the migration.”
In fact, Pantheon didn’t receive a single customer support call due to the migration and didn’t open a single ticket with GCP during the process. “But while we say that nobody noticed,” Koenig acknowledges, “the truth is that we noticed some things.” Pantheon identified a few edge cases that caused some performance regression.
Using New Relic, he says, “we were able to do a quick root cause analysis and fix the problems before any of our customers noticed. Not only were we able to avoid any major incidents, but due to newer CPUs and faster networking, we were able to deliver a 5% performance improvement to customers.” Those improvements were verified using detailed before/after performance data obtained with New Relic.
To learn more about how Josh and his team migrated to the cloud with confidence, be sure to read the full customer case study: Pantheon Confidently Accelerates Migration to Google Cloud Platform Using New Relic.