Android Runtime Performance Analysis Using New Relic: ART vs. Dalvik

At Google I/O this year, we saw a number of impressive performance improvements from the Android Runtime (ART) team. In case you missed the introduction during the keynote, you can watch the in-depth talk here. While ART itself wasn’t the biggest surprise and has been hiding deep in the developer menu since Android 4.4, the most important announcement was that Android L (the next major release of the OS) will switch to ART by default. Let’s take a quick tour of some of the new features included with ART to see how they stack up to Dalvik, the current runtime.

ART vs Dalvik

In perhaps the most important improvement, ART now compiles your application to native machine code when installed on a user’s device. Known as ahead-of-time compilation, you can expect to see large performance gains as the compilers are tuned for specific architectures (such as ARM, x86, or MIPS). This eliminates the need for just-in-time compilation each time an application is run. Thus your application will take slightly longer to install but will boot faster upon launch as many tasks executed at runtime on the Dalvik VM, such as class and method verification, have already taken place.

Next, the ART team worked to optimize the garbage collector (GC). Instead of two pauses totaling about 10ms for each GC in Dalvik, you’ll see just one, usually under 2ms. They’ve also parallelized portions of the GC runs and optimized collection strategies to be aware of device states. For example, a full GC will run only when the phone is locked and user interaction responsiveness is no longer important. This is a huge improvement for applications sensitive to dropped frames. Additionally, future versions of ART will include a compacting collector that will move chunks of allocated memory into contiguous blocks to reduce fragmentation and the need to kill older applications to allocate large memory regions.

Lastly, ART makes use of an entirely new memory allocator called Rosalloc (runs of slots allocator). Most modern systems use allocators based on Doug Lea’s design, which has a single global memory lock. In a multithreaded, object-oriented environment, this interferes with the garbage collector and other memory operations. In Rosalloc, smaller objects common in Java are allocated in a thread-local region without locking and larger objects have their own locks. Thus when your application attempts to allocate memory for a new object, it doesn’t have to wait while the garbage collector frees an unrelated region of memory.

Needless to say, all of these improvements have done wonders for application performance and battery life. During the I/O keynote, this slide popped up and grabbed my attention:

ART performance comparison

Using New Relic to compare ART vs. Dalvik

These are some pretty serious claims! Since we cherish application performance here at New Relic, I decided to take a look at Dalvik vs. ART performance through our Android agent.

First, I needed a quick little application that would demonstrate a performance issue. What better place to find computationally intensive code than the Great Computer Language Benchmarks Game? I settled on the spectral norm demonstration and dropped the code into a basic Android application. After adding a few @Trace annotations to tell the New Relic agent to instrument some of the heavy lift methods, I fired up two Genymotion images, one with ART enabled and one with Dalvik. A few minutes later, I was already seeing spectacular results:

ART vs Dalvik2

For this simple case, ART is more than three times as fast as the same code running on the Dalvik virtual machine. In fact, I had to run the test 10 times just to keep the numbers tangible. Here’s an individual trace on the Dalvik VM:

ART vs Dalvik3.jpg

And here’s the same test on ART:

ART vs Dalvik4

On Dalvik, each test iteration (running synchronously on its own thread) takes about 1400 milliseconds to complete. On ART, the same test takes only about 400 milliseconds. You can also see the ART version used about 2MB less memory (20% less in this case).

While my quick and dirty test is limited to CPU-intensive operations, the results are already very encouraging. If you want to read more on ART, be sure to check out this great introduction. If you’re looking to test your application on ART, you’ll definitely want to review this best practices guide. Have a similar ART performance story? Be sure to let us know. In case you’re curious and want to try this out on your own, you can check out the benchmark source on GitHub.

Interested in getting other performance analytics for your Android apps? Try out New Relic Mobile today. If you’re brand new to it, you’ll get a free 30-day trial!

A longtime fan of New Relic, Jason joined the team to contribute his user insight, dark computer magic, and knowledge of vintage boomboxes. View posts by .

Interested in writing for New Relic Blog? Send us a pitch!