Diving Into Bytecode Manipulation: Creating an Audit Log With ASM and Javassist

With Spring and Hibernate on your stack, your application’s bytecode is likely enhanced or manipulated at runtime. Bytecode is the instruction set of the Java Virtual Machine (JVM), and all languages that run on the JVM must eventually compile down to bytecode. Bytecode is manipulated for a variety of reasons:

Program analysis:

    • find bugs in your application
    • examine code complexity
    • find classes with a specific annotation

Class generation:

    • lazy load data from a database using proxies

Security:

    • restrict access to certain APIs
    • code obfuscation

Transforming classes without the Java source code:

    • code profiling
    • code optimization

And finally, adding logging to applications.

There are several tools that can be used to manipulate bytecode, ranging from very low-level tools such as ASM, which require you to work at the bytecode level, to high level frameworks such as AspectJ, which allow you to write pure Java.

Bytecode manipulation frameworks

In this blog, I will demonstrate one way to use Javassist and ASM to create an audit log.

Audit Log Example

Supposed we have the following code:


public class BankTransactions {
    public static void main(String[] args) {
	BankTransactions bank = new BankTransactions();

	for (int i = 0; i < 100; i++) {
	    String accountId = "account" + i;
	    bank.login("password", accountId, "Ashley");
	    bank.unimportantProcessing(accountId);
	    bank.withdraw(accountId, Double.valueOf(i));
	}

	System.out.println("Transactions completed");
    }
}

We want to log the important actions along with key information to identify the action. Above, I would identify the important actions as login and withdraw. For login, the important information would be the account ID and the user. For withdraw, the important information would be the account ID and the amount of money withdrawn. One way to log the important actions is to add a logging statement to every important method, but this will be tedious. Instead, we can add an annotation to the important methods and then use a tool to inject the logging. In this case, the tool will be a bytecode manipulation framework.


@ImportantLog(fields = { "1", "2" })
public void login(String password, String accountId, String userName) {
    // login logic
}

@ImportantLog(fields = { "0", "1" })
public void withdraw(String accountId, Double moneyToRemove) {
    // transaction logic
}

The @ImportantLog annotation indicates that we want to log a message every time the method is called, and the fields parameter on the @ImportantLog annotation indicates the index position of each parameter that should be logged. For example, for login we want to log the input parameters at position 1 and 2. This would be the accountId and the userName. We will not log the password at position 0.

There are two main advantages of using bytecode and annotations to perform the logging:

  1. The logging is separated from the business logic, which helps keep the code clean and simple.
  2. It is easy to remove the audit logging without modifying the source code.

Where do we actually modify the bytecode?

We can use a core Java feature introduced in 1.5 to manipulate the bytecode. This feature is called a Java agent.

To understand a Java agent, let’s look at the flow of a typical Java process.

Java agent

The command java is executed with the class containing our main method as the one input parameter. This starts a Java runtime environment, uses a classloader to load the input class, and invokes the main method on the class. In our specific example, the main method of BankTransactions is invoked, which causes some processing to occur and “Transactions completed” to be printed.

Now let’s examine a Java process that uses a Java agent.

Java agent 2

The command java is run with two input parameters. The first is the JVM argument -javaagent, pointing to the agent jar. The second is the class containing our main method. The javaagent flag tells the JVM to first load the agent. The agent’s main class must be specified in the manifest of the agent jar. Once the class is loaded, the premain method on the class is invoked. This premain method acts as a setup hook for the agent. It allows the agent to register a class transformer. When a class transformer is registered with the JVM, that transformer will receive the bytes of every class prior to the class being loaded in the JVM. This provides the class transformer with the opportunity to modify the bytes of a class as needed. Once the class transformer has modified the bytes, it returns the modified bytes back to the JVM. These bytes are then verified and loaded by the JVM.

In our specific example, when BankTransaction is loaded, the bytes will first go to the class transformer for potential modification. The modified bytes will then be returned and loaded into the JVM. Once loaded, the main method on the class will be invoked causing some processing to occur and “Transactions completed” to be printed.

Let’s view the code. Below I have the premain method of the agent:


public class JavassistAgent {
    public static void premain(String agentArgs, Instrumentation inst) {
	System.out.println("Starting the agent");
	inst.addTransformer(new ImportantLogClassTransformer());
    }
}

The premain method prints out a message and then registers a class transformer. The class transformer must implement the method transform, which is invoked for every class loaded into the JVM. It provides the byte array of the class as input to the method, which then returns the modified byte array. If the class transformer decides not to modify the bytes of the specific class, then it can return null.


public class ImportantLogClassTransformer implements ClassFileTransformer {

    public byte[] transform(ClassLoader loader, String className,
	Class classBeingRedefined, ProtectionDomain protectionDomain,
	byte[] classfileBuffer) throws IllegalClassFormatException {

	// manipulate the bytes here

        return modified bytes;
    }
}

Now that we know where to modify the bytes of a class, we need to know how to modify the bytes.

How do we modify the bytes using Javassist?

Javassist is a bytecode manipulation framework with both a high level and low level API. I’ll be focusing on the high-level, object-oriented API, beginning with an explanation of the objects in Javassist. Next I’ll follow with the actual code for the audit log application.

Javassist1

Javassist uses a CtClass object to represent a class. These CtClass objects can be obtained from a ClassPool and are used to modify Classes. The ClassPool is a container of CtClass objects implemented as a HashMap where the key is the name of the class and the value is the CtClass object representing the class. The default ClassPool uses the same classpath as the underlying JVM. Therefore, in some cases, you may need to add classpaths or class bytes to a ClassPool.

Similar to a Java class which contains fields, methods, and constructors, a CtClass object contains CtFields, CtConstructors, and CtMethods. All of these objects can be modified. I’ll focus on method manipulation, since this behavior will be required for our audit log application.

Below are a few of the ways to modify a method:

Javassist2

The diagram above shows one of the main advantages of Javassist. You do not actually have to write bytecode. Instead, you write Java code. The one complication is that the Java code must go inside quotes.

Now that we understand the basic building blocks of Javassist, let’s move to the actual code for our application. The transform method of the Class transformer needs to perform the following steps:

  1. Convert byte array to a CtClass object
  2. Check each method of CtClass for the annotation @ImportantLog
  3. If @ImportantLog annotation is present on the method, then
    • Get important parameter method indexes
    • Add logging statement to beginning of the method

As you write Java code using Javassist, be wary of the following gotchas:

    • The JVM uses slashes between packages while Javassist uses dots.
    • When inserting more than one line of Java code, the code needs to go inside brackets.
    • When referencing method parameter values using $1, $2, etc, know that $0 is reserved for “this”. This means the value of the first parameter to your method is $1.
    • Annotations are given a visible and invisible tag. Invisible annotations cannot be seen at runtime.

The actual Java code is below.


public class ImportantLogClassTransformer implements ClassFileTransformer {

  private static final String METHOD_ANNOTATION = 
      "com.example.spring2gx.mains.ImportantLog";
  private static final String ANNOTATION_ARRAY = "fields";

  private ClassPool pool;
                               
  public ImportantLogClassTransformer() {
    pool = ClassPool.getDefault();
  }
                     
  public byte[] transform(ClassLoader loader, String className,
    Class classBeingRedefined, ProtectionDomain protectionDomain,
    byte[] classfileBuffer) throws IllegalClassFormatException {

    try {
      pool.insertClassPath(new ByteArrayClassPath(className,
        classfileBuffer));
      CtClass cclass = pool.get(className.replaceAll("/", "."));
	if (!cclass.isFrozen()) {
	  for (CtMethod currentMethod : cclass.getDeclaredMethods()) {
	    Annotation annotation = getAnnotation(currentMethod);
	    if (annotation != null) {
	      List parameterIndexes = getParamIndexes(annotation);
		currentMethod.insertBefore(createJavaString(
		currentMethod, className, parameterIndexes));
	    }
	  }
	  return cclass.toBytecode();
	}

      } catch (Exception e) {
	e.printStackTrace();
      }
      return null;
    }
                        
  private Annotation getAnnotation(CtMethod method) {
    MethodInfo mInfo = method.getMethodInfo();
    // the attribute we are looking for is a runtime invisible attribute
    // use Retention(RetentionPolicy.RUNTIME) on the annotation to make it
    // visible at runtime
    AnnotationsAttribute attInfo = (AnnotationsAttribute) mInfo
      .getAttribute(AnnotationsAttribute.invisibleTag);
    if (attInfo != null) {
      // this is the type name meaning use dots instead of slashes
      return attInfo.getAnnotation(METHOD_ANNOTATION);
    }
    return null;
  }
                       
  private List getParamIndexes(Annotation annotation) {
    ArrayMemberValue fields = (ArrayMemberValue) annotation
      .getMemberValue(ANNOTATION_ARRAY);
    if (fields != null) {
      MemberValue[] values = (MemberValue[]) fields.getValue();
      List parameterIndexes = new ArrayList();
      for (MemberValue val : values) {
	parameterIndexes.add(((StringMemberValue) val).getValue());
      }
      return parameterIndexes;
    }
    return Collections.emptyList();
  }
                            
  private String createJavaString(CtMethod currentMethod, String className,
    List indexParameters) {
    StringBuilder sb = new StringBuilder();
    sb.append("{StringBuilder sb = new StringBuilder");
    sb.append("(\"A call was made to method '\");");
    sb.append("sb.append(\"");
    sb.append(currentMethod.getName());
    sb.append("\");sb.append(\"' on class '\");");
    sb.append("sb.append(\"");
    sb.append(className);
    sb.append("\");sb.append(\"'.\");");
    sb.append("sb.append(\"\\n    Important params:\");");
    for (String index : indexParameters) {
      try {
	// add one because 0 is "this" for instance variable
	// if were a static method 0 would not be anything
	int localVar = Integer.parseInt(index) + 1;
	sb.append("sb.append(\"\\n        Index \");");
	sb.append("sb.append(\"");
	sb.append(index);
	sb.append("\");sb.append(\" value: \");");
	sb.append("sb.append($" + localVar + ");");
      } catch (NumberFormatException e) {
	e.printStackTrace();
      }
    }
    sb.append("System.out.println(sb.toString());}");
    return sb.toString();
  }
}

And we are done! We can run our application and see the logging get printed to ‘System.out’.

On the positive side, the amount of code written is pretty minimal and we did not actually have to write bytecode to use Javassist. The big drawback is that writing Java code in quotes can become tedious. Luckily, some of the other bytecode manipulation frameworks are faster. Let’s take a look at one of those faster frameworks.

How do we modify the bytes using ASM?

ASM is a bytecode manipulation framework that has a small memory footprint and is relatively fast. I consider ASM to be the industry standard for bytecode manipulation, as even Javassist uses ASM under the hood. ASM provides both object and event-based libraries, but here I’ll focus on the event-based model.

To understand ASM, I will start with a diagram (below) of a Java class that comes from ASM’s own documentation. It shows that a Java class is composed of several parts, including a superclass, interfaces, annotations, fields, and methods. In ASM’s event-based model, all of these class components can be considered events.

Java Class

The class events for ASM can be found on a ClassVisitor. In order “see” these events, you must create a classVisitor that overrides the desired components you want to see.

ClassVisitor

In addition to a class visitor, we need something to parse the class and generate events. ASM provides an object called a ClassReader for this purpose. The reader parses the class and produces events. After the class has been parsed, we need a ClassWriter to consume the events, converting them back to a class byte array. In the diagram below, we have the bytes of the BankTransactionsclass being passed to a ClassReader, which sends the bytes to a ClassWriter, which outputs the resulting BankTransaction. When there is no ClassVisitor present, the input BankTransactions bytes should essentially match its output bytes.

ClassReader and ClassWriter


public byte[] transform(ClassLoader loader, String className,
    Class<?> classBeingRedefined, ProtectionDomain protectionDomain,
    byte[] classfileBuffer) throws IllegalClassFormatException {
                                  
    ClassReader cr = new ClassReader(classfileBuffer);
    ClassWriter cw = new ClassWriter(cr, ClassWriter.COMPUTE_FRAMES);

    cr.accept(cw, 0);
    return cw.toByteArray();
}

The ClassReader takes in the bytes of the class, while the ClassWriter takes in the class reader. The accept call to the ClassReader says parse the class. Next, we access the resulting bytes from the ClassWriter.

Now suppose we want to modify the BankTransaction bytes. First, we need to chain in a ClassVisitor. This ClassVisitor will override methods, such as visitField or visitMethod, to receive notifications about that specific class component.

ClassVisitor chain

Below is the code representing the diagram above. The class visitor LogMethodClassVisitor has been added. Note that you can add more than one class visitor.


public byte[] transform(ClassLoader loader, String className,
    Class<?> classBeingRedefined, ProtectionDomain protectionDomain,
    byte[] classfileBuffer) throws IllegalClassFormatException {
                            
    ClassReader cr = new ClassReader(classfileBuffer);
    ClassWriter cw = new ClassWriter(cr, ClassWriter.COMPUTE_FRAMES);
    ClassVisitor cv = new LogMethodClassVisitor(cw, className);
    cr.accept(cv, 0);
    return cw.toByteArray();
}

For the audit log application, we need to examine each method on the class. This means the ClassVisitor need only override ‘visitMethod’.


public class LogMethodClassVisitor extends ClassVisitor {
    private String className;
           
    public LogMethodClassVisitor(ClassVisitor cv, String pClassName) {
	super(Opcodes.ASM5, cv);
	className = pClassName;
    }
                                                         
    @Override
    public MethodVisitor visitMethod(int access, String name, String desc,
	    String signature, String[] exceptions) {
	//put logic in here
    }
}

Notice that the visitMethod returns a MethodVisitor. Just like a class has many components, a method also has many components, which can be considered events when the method is parsed.

Method components

The MethodVisitor provides the events on a method. For the audit logging application, we want to examine the annotations on the method. Based on the annotations, we might need to modify the actual code in the method. To make these modifications we need to chain in a methodVisitor as seen below.


@Override
public MethodVisitor visitMethod(int access, String name, String desc, 
    String signature, String[] exceptions) {                               
    MethodVisitor mv = super.visitMethod(access, name, desc, signature,
            exceptions);
    return new PrintMessageMethodVisitor(mv, name, className);
}

This PrintMessageMethodVisitor will need to override visitAnnotation and visitCode. Note that visitAnnotation returns an AnnotationVisitor. Just like classes and methods have components, there are also multiple components to an annotation. An AnnotationVisitor allows us to visit all parts of an annotation.

Below I have outlined the steps for visitAnnotation and visitCode.


public class PrintMessageMethodVisitor extends MethodVisitor {

  @Override
  public AnnotationVisitor visitAnnotation(String desc, boolean visible) {
    // 1. check method for annotation @ImportantLog
    // 2. if annotation present, then get important method param indexes
  }
                                                     
  @Override
  public void visitCode() {
    // 3. if annotation present, add logging to beginning of the method
  }	
}

As you write Java code using ASM, be wary of the following gotchas:

  • In the event-model, the events for a class or method will always occur in a particular order. For example, the annotations on a method will always be visited before the actual code.
  • When referencing method parameter values using $1, $2, etc., know that $0 is reserved for “this”. This means the value of the first parameter to your method is $1.

The actual Java code is below:


public AnnotationVisitor visitAnnotation(String desc, boolean visible) {
  if ("Lcom/example/spring2gx/mains/ImportantLog;".equals(desc)) {
    isAnnotationPresent = true;
    return new AnnotationVisitor(Opcodes.ASM5,
        super.visitAnnotation(desc, visible)) {
      public AnnotationVisitor visitArray(String name, Object value) {
        if ("fields".equals(name)) {
          return new AnnotationVisitor(Opscodes.ASM5,
              super.visitArray(name)) { 
            public void visit(String name, Object value) {
              parameterIndexes.add((String) value);
              super.visit(name, value);
            }
          };
        } else {
          return super.visitArray(name);
        }
      }
    };
  }
  return super.visitAnnotation(desc, visible);
}                                                                                                                       
public void visitCode() {
  if (isAnnotationPresent) {
    // create string builder
    mv.visitFieldInsn(Opcodes.GETSTATIC, "java/lang/System", 
        "out","Ljava/io/PrintStream;");
    mv.visitTypeInsn(Opcodes.NEW, "java/lang/StringBuilder");
    mv.visitInsn(Opcodes.DUP);
    // add everything to the string builder
    mv.visitLdcInsn("A call was made to method \"");
    mv.visitMethodInsn(Opcodes.INVOKESPECIAL,
        "java/lang/StringBuilder", "",
        "(Ljava/lang/String;)V", false);
    mv.visitLdcInsn(methodName);
    mv.visitMethodInsn(Opcodes.INVOKEVIRTUAL,
        "java/lang/StringBuilder", "append",
        "(Ljava/lang/String;)Ljava/lang/StringBuilder;", false);

. . .

One of the major differences between Javassist and ASM can be seen above. With ASM, you have to write code at the bytecode level when modifying methods, meaning you need to have a good understanding of how the JVM works. You need to know exactly what is on your stack and the local variables at a given moment of time. While writing at the bytecode level opens up the door in terms of functionality and optimization, it does mean ASM has a long developer ramp up time.

Take-home exercise

Now that you have seen how to use ASM and Javassist in one scenario, I encourage you to give a bytecode manipulation framework a try. Bytecode manipulation not only gives you a better understanding of the JVM, but there are countless applications for it. Once started, you will find that the sky’s the limit.

This blog is based on the technical content from the talk I gave at SpringOne 2GX 2014 on bytecode manipulation. Be sure to check out the video on InfoQ once it is released.

Ashley Puls is a principal software engineer and architect at New Relic. She has a background in mathematics and enjoys diving into the details of language agents. When not programming, Ashley competes in triathlons and soccer games. View posts by .

Interested in writing for New Relic Blog? Send us a pitch!