At RailsConf 2012, we sat down with five of our customers (including Jesse Proudman, panel moderator and CEO of the Blue Box Group,) and asked them about the ups and downs of scaling massive websites. In the session, they discussed how to manage millions of unique visitors, unexpected traffic bursts and more. In this interview, Tim Sturge, Director of Engineering at Zendesk, graciously answered a few remaining questions we weren’t able to get to during the live panel discussion.
NR: What are the three most actionable items you should pay attention to as your application grows?
TS: I’d recommend paying attention to:
* Maintaining Rails performance
* Scalability of infrastructure services
* Hardware and VM capacity planning
NR: What two items caught your team by surprise?
TS: These were:
* Internal MySQL scalability bottlenecks. (They were hard to see coming and often had sudden cliffs.)
* The amount of overheard running a VM costs versus bare metal hardware. (A 20 – 30% performance gain on bare metal.)
NR: Walk us through your capacity planning process?
TS: We estimate a number of requests/app server we can manage on a weekly basis and extrapolate our growth to a number of app servers. On the database side we look at capacity (since we use FusionIO, capacity is a more important metric to us than IOPS) and estimate size/objects and # of objects.
NR: How many people are accessing your site on mobile devices and how do you optimize for that?
TS: We have a lot of mobile usage that is all through the API. We don’t really do anything special here (except for making sure the API is comprehensive and performant.)
NR: How do you solve the data challenge and what do you do with the data you collect?
TS: We use a variety of solutions. We have instrumented the application to emit metrics, which we queue in a local Redis instance and then push in bulk to a backend store to be processed with MapReduce.
For more information on managing real world web apps at massive scale, watch this video the RailsConf panel.