In today’s guest post, Miguel Angel Mingorance Fernandez, a Systems Engineer at Delivery Hero, shares how the online food ordering and delivery business scales for rapid growth with New Relic.
In 2011, Delivery Hero launched its global online food ordering and delivery business and has since grown significantly. At the current pace, we expect to see more than one billion orders on our platforms in 2020. While this is great for business, our infrastructure team needs to avoid any outages or downtime to continue delivering an amazing experience to our customers in all time zones. In order to do this, the team is investing heavily in restructuring our systems to ensure uptime and make us better equipped to launch into new regions and markets.
One of the most significant initiatives that I am working on will shift our operations from a regionalized system, where EU users connect to a platform in the EU and Asian users connect to a platform in Asia, to country-isolated systems. Our previous approach worked while we were smaller, but was not scalable, as an outage would impact all countries in a region. With isolated locations, we can improve our local monitoring systems and better tailor the unique regional-specific needs of our customers. This way, all of our regions will be able to handle a higher load capacity.
New operational strategies always come with new challenges, and we quickly realized we needed to find some creative solutions to address them. We had to be careful in adjusting resources, such as virtual machines, databases, and caching layers, as well as allocating new resources in the cloud for each location. To do this, we are now implementing a service mesh and data layers to interconnect our environments. This will allow us to operate and run our regional isolations separately but keep a single view of our entire system to understand how our smaller, individual systems communicate with each other. We also had to make sure that our most critical applications were refactored to better collaborate with the new isolated cloud environment.
At the same time, we changed many of our naming conventions so that all of our systems could understand each other. We’re currently using Terraform to make sure everything is correctly connected and compatible.
We also decided to change how we test and roll out new features in each region. Different countries have different needs, and now we can customize our updates. Using this approach, we can analyze the feedback from the updates and determine how they were perceived per location, leaving less data for our team to analyze overall. We can also use canary deployments by testing on particular locations before pushing live to a key region.
Thanks to this shift in our infrastructure practices, our team can better optimize and prepare our systems to be more reliable, secure, and agile than before. To make this initiative happen, we leveraged several open source tools such as Prometheus, Spinnaker, Kubernetes, and Helm. We believe that open source helps engineers interact and build better software. In fact, we have contributed to Dataform public models and even own a Helm repo in GitHub.
We also rely on New Relic to understand our application performance data. New Relic shows all of our connected data clearly and has robust tracing capabilities that allow us to fully understand how our systems are performing. Since we’ve isolated our regions, it’s even more important now to see all of our interdependencies and be alerted when something is wrong. We also use New Relic to see how we contribute to the business and link our software performance results to key business metrics.
Next, as we continue to promote a more secure platform, we plan to launch global user authentication for our customers. Our teams are committed to delivering an amazing experience to our customers worldwide and introducing our app in new regions, and we are thrilled that our infrastructure optimizations directly and positively impact our growth.
Check out more New Relic customer stories.