Every web site is different. Using generic benchmarks to evaluate which of the many Python WSGI hosting mechanisms is the best will never give you the whole picture. This is because your web site performance depends on more than just the hosting mechanism. Web sites have many moving parts and all use external resources in different ways. Until now, there has not been a production-quality tool in the Python web community for giving deep introspection into what your live web site is doing when being used by real users. With the release of the New Relic Python agent we are changing the landscape and providing you with what will no doubt become an indispensable tool for monitoring the performance of your web site. This ability isn’t something new however, as it builds on the existing best-of-breed support available from our existing Ruby, Java, dotNet and PHP agents. All the features that have been available for our existing agents are now available for Python!
To provide the best experience with New Relic however, we need to go further than just supporting the Python language, we need to instrument the specific web frameworks you are using. This ensures that the information we provide is more relevant to you and gives you direct pointers into your framework and your application to help you identify issues.
This tour explores the different ways we have instrumented Django with the New Relic Python agent to extract more in-depth information about what is happening with your web requests.
Web Transaction Naming
When the New Relic Python agent is used with a web framework for which we do not provide specific instrumentation, the only way we can identify each request is by using the original URL. This is problematic for a number of reasons. The first reason is that the URL has no meaning unless you are intimate with how the framework or your routing rules have been setup to map URLs to handler functions within your code. Having to constantly refer back to code to try and understand the data coming in defeats the objective of providing results that can be quickly analyzed and evaluated. The second reason is that the URL usually encodes information about the target of the resource. This results in an extremely large number of unique URLs making it difficult or impossible to aggregate data about requests in a meaningful way.
Simply to protect against the large volume of data and to avoid what we call metric explosion, it is necessary to apply automated rules to collapse down data. How this is done however may only serve to make interpretation of the data harder. When adding support for a web framework we therefore aim to provide targeted instrumentation that identifies the key functions which handle respective URLs. We then name the web transaction after these handler functions. In the case of Django this would be the view handler functions which are mapped to via routing rules configured in the ‘urls.py’ file. This means web transaction names are more relevant to you and give a direct pointer into your code. The number of unique web transactions names will also be greatly reduced and you will not become overwhelmed with the number of different metrics presented.
It isn’t actually that simple with Django because the request may never endup being handled by a view handler. Instead, a response may be generated by a request or view middleware before the view handler is even called. This may occur, for example, when a caching middleware is being used. One also has to consider when a response middleware transforms the response returned by an earlier middleware or the view handler. A suitable view handler may not even be found or an exception may occur in which case you are dealing with an exception middleware. In these situations, where there are multiple actors, we will name the web transaction after what is regarded as being the primary source of the response. A request middleware will therefore take priority over a response middleware, but if a view handler is called it will always take priority. The web transaction names can therefore be either the names of middleware or of the view handler. For the case of a URL not being able to be mapped to a view handler, it could be handled by the default handler for page not found and labeled as such.
We have been able to reduce the number of metrics by aggregating based on the name of the code function which generated the response. Where however was the time being spent in executing those requests? As shown in the overview charts below, we are able to track time spent in the database, memcache, external services and in the Python code of your application. The charts show an aggregation of all such operations across all requests during the reporting period.
We don’t perform detailed profiling of code due to the excessive overhead. However, detailed data is collected through targeted instrumentation of the specific things which are of interest. This data is then used to create the summaries used in the overview charts. And we don’t stop there! That data is also aggregated against the web transactions themselves allowing us to to provide an aggregated performance breakdown over time for each named web transaction.
The performance breakdown will show time for specific classes of database queries, memcache requests and calls to external services. These are picked up through instrumentation of the database, memcache or network client libraries and are independent of Django. For Django and your application code, we record time spent in any middleware, the view handler, overall template rendering and within specific blocks within your template code separately. The performance breakdown provides a quick way of seeing where the most time is being spent in handling a request.
The value of being able to track exactly where time is being spent in a transaction is best appreciated during the analysis of a specific transaction that is slowing down your application. New Relic allows you to see where time was spent and most importantly see the context of any time consuming operation in the overall call hierarchy. When SQL queries occur, the full SQL query and a stack trace of where in the code the SQL query was performed is also collected and available by drilling down on that database query.
When Django is deployed in a production environment and an exception occurs an error response page will be generated by Django and returned to the user. Details of that exception however, will not be logged to the server error log. In order to capture those details you would have had to set up ‘ADMINS’ in your ‘settings.py’ file as well as define a mail server through which Django can deliver email. When an exception occurs, Django will then send an email for each occurrence of an exception. In our instrumentation of Django, even if you haven’t setup ‘ADMINS’, New Relic will intercept those same exceptions and pass them back for display in our UI.
Your exceptions are all collected together in one place and you are able to drill down on each error to see the exception type, message and the stack traceback of where in your code the problem occurred. In some situations, Django will catch exceptions and convert them into a different exception. One example of this is when Django is unable to resolve the view handler based on your rules in ‘urls.py’. When this occurs, Django will raise a ‘ViewDoesNotExist’ exception. In doing that however, the original cause of the problem, including the stack traceback, is discarded by Django. In cases where Django does this, New Relic intercepts the exception prior to Django converting and losing the details of the original exception. The New Relic Python agent is therefore able to report the original error and no longer will you be left wondering what the true cause of the problem was.
Real User Monitoring
Real user monitoring instruments your web pages with special headers and footers allowing New Relic to collect client side metrics. This includes network, DOM processing and page rendering time.
When you are using Django with New Relic we are able to do this on the fly and automatically by hooking in a response middleware to perform the insertion. This information can really put things into context. Without this information, you may have been focusing too much of your time trying to optimize your web application to improve overall user experience, when in reality browser page loading time is the bottleneck. You may be able to get more immediate end user benefits by looking at what you are producing in your responses rather than how that information is being produced. Real User Monitoring can tell you immediately whether you are chasing the right problem.
The Long View
The default view provided by New Relic is the last 30 minutes. This will tell you what is happening now in real time. This isn’t always what is most important. Sometimes, you need to be able to step back and look at the long view. That is, how is your web application performing now in comparison to last week, last month or right before that last deploy. Has your site performance improved, stayed about the same or have your attempts to fix problems just made it worse.
As with all the New Relic agents, the Python agent provides up to 3 months of historical data with our New Relic Pro product. Comparison data is also available through reports, including reports on specific web transactions.
Get Started Now
Do your current tools give you this level of detail about your web application? Know what is going on inside your web application. Monitor the activity of your real users and not just a network of test clients. See how your site performs for different browsers and identify where you can make real improvements. Look at the long view and work out whether the changes you are making are helping or just sending you backwards.
Signup for New Relic and start getting the performance data you need!