Strength in Numbers is a new blog series looking at the ways Covid-19 has impacted New Relic customers, and how they’ve been adapting to the disruption.
At the start of 2020, Chegg knew it was going to be a busy year.
The high-growth, online learning platform—which offers on-demand access to a variety of educational resources, including digital study aids, math and writing helpers, and subject-specific tutoring for mostly high school and college students—was already consistently breaking its own traffic records semester after semester.
So in March, when colleges and universities across the country began closing their campuses and sending students home over fears of Covid-19 outbreaks, the entire educational system and campus culture were pushed further online.
Whereas events such as lectures were well-suited for an online outlet via live streaming, activities like on-campus study groups and individual tutoring sessions all but evaporated. Due to these changes, Chegg saw record traffic and demand for its services surge beyond its initial estimations, but luckily it was ready.
Chegg takes its relationship of trust with its students incredibly seriously. The company needs to earn students’ business every month, and it can’t afford to break that bond. If a single student is impacted by an outage when they’re in the middle of studying for a midterm, they’ll be much less likely to renew their Chegg subscription the following semester.
Since the pandemic occurred in the middle of spring, as students were beginning to eye the back half of the semester, Chegg knew its services had to work when they were called upon. And that’s why the company put its trust in New Relic One.
Instrumenting for the future
When Steve Evans, Vice President of Engineering Services, joined the company in 2017, he saw a lot more potential with Chegg’s instance of New Relic. The company had deployed New Relic APM to monitor performance data like response times and error rates, but there were so many more capabilities waiting to be unlocked. “It’s like we had a Ferrari sitting in rush hour traffic,” he says.
In January 2018, a rare outage—caused by a frontend page issuing too many API calls to a backend system, which in turn brought down a database—made a deeper exploration of the full capabilities of New Relic One a top priority.
Chegg called on New Relic’s support team to provide enablement sessions for the company’s engineers. Once they were better acquainted with the New Relic One capabilities, the team spent a good part of 2019 determining what data would best help them reduce their mean time to resolution (MTTR) as they continued to scale.
For example, in early 2020, Chegg migrated to New Relic Logs from a platform that had daily quotas for logs. Given the traffic Chegg was seeing during the early days of the pandemic, Evans said it would’ve hit its logging caps halfway through each day. Since New Relic Logs has no daily caps the team is able to view log messages in context with event and trace data at all times.
Chegg’s infrastructure runs on hundreds of hosts in AWS, with about 80% of the compute workload containerized with Docker via Amazon Elastic Container Service (ECS). In total, the company has more than 500 services in production, all instrumented with New Relic.
“New Relic’s really been that observability platform that’s allowed us to detect, measure, and iterate as we’ve been getting ourselves to a place where the big focus has been reducing MTTR,” Evans says. “And when you’re a company that’s growing 30% year-over-year, you tend to have a lot of incidents related to load. And we’ve stopped having those issues.”
In fact, Evans saw MTTR shrink in 2018 from 197 minutes to 33 minutes. In 2019, MTTR moved even lower to 24 minutes.
Trends from the impact of Covid-19
As students were sent home after Covid-19 was officially declared a global pandemic, Chegg saw three trends of note.
First, there was an uptick in interest in all of its services across the board.
“Students at home are not getting the same experience in class or on campus, which means they’re now depending on other resources more,” Evans says. “They have writing centers on campus. Well, those writing centers are gone. So students are now looking for other solutions online, and if you already had a relationship with us, and you were looking for writing help, you ended up on our writing tools product. At the end of the day, students are just trying to pass classes.”
Second, during a traditional semester, Chegg generally saw low traffic on Fridays and Saturdays, whereas on Sunday night Chegg would see the most demand, especially during finals and midterms. Conversely, as students went home, their use of Chegg for studying shifted more toward a steady, Monday through Friday, 9 a.m. to 5 p.m. model.
Finally, Chegg began seeing a lot more international traffic, which made delivering a consistent platform performance sometimes challenging. “The New Relic Browser work we did beforehand allowed us to deliver a high level of performance globally,” Evans says.
Gearing up for back to school
Where other companies’ applications and services might’ve buckled under the unpredictable weight of the shifts and demands caused by the pandemic’s disruption, Chegg stood strong. The work Chegg had done with New Relic to fortify its customer experiences and strengthen trust with its customers more than paid off during the most intense moments of the spring. “It was a non-event for our engineering teams,” Evans said.
That work also allowed the engineering team to focus on its roadmap for the summer, so the company is ready for the new demands of the fall semester.
“If we would have been firefighting in March, April, May,” Evans says, “that means these initiatives in June and July would have been at risk. Ultimately we just would not have been in a position to execute to the level that we want, which then puts us in a position where we’re not as prepared for the fall when the students come back to school, as we would be otherwise.”
Now Chegg feels confident as it looks towards the rest of the year, whatever that might bring.
“It’s a pretty stressful time already,” Evans says. “If we were having issues just because we didn’t plan for the load, we would not have been able to capitalize on the opportunity in front of us.”
Get a deeper dive into Chegg’s use of New Relic One, including how it’s beginning to develop its own custom applications and how it’s pushing more toward serverless, by reading our customer story.