Wednesday, 21 November 2007

bytecode generation for fun and profiling

interfaces

An interface in Java is just a list of methods that can be implemented by classes. This lets you access objects of different classes in the same way, even if their implementation is unrelated.

At least in principle, the interface is written first, and then the class references it:

class Widget implements Drawable {

But to turn it around, you could define an interface as a subset of a class's methods, and use that to reduce coupling and allow calling code to be compiled without access to library code (common in J2EE).

The problem is that the class has to be tagged implements X at compile time.

example

I was working on a web-app and needed to set a parameter on an Oracle data source. I had a generic DataSource object, and the solution was to cast it to its real oracle.jdbc.pool.OracleDataSource and call the method. Three problems:

  • I now have to include the oracle driver when I compile
  • If a driver update or reconfiguration alters the class name, not only will my fix not work, but my if(ds instanceof OracleDataSource) line will cause a crash
  • This is pretty generally grungy just so i can call setFoo(true).

These can be hacked around by using reflection, but it's ugly, not typesafe, and slow (less so these days).

They can be fixed if the class implemented an interface which was provided in a separate jar, but it doesn't.

build a bridge and get over it

public interface ToggleFoo { public void setFoo(boolean v); }

Bridge.expose(datasource, ToggleFoo.class).setFoo(true);

The Bridge utility creates a wrapper class like this:

public class Bridge_OracleDataSource_ToggleFoo implements ToggleFoo {
   private Object __target;
   public void setFoo(boolean v) {
      ((OracleDataSource)__target).setFoo(v);
   }
}

It then instantiates it and sets __target to the object you pass. Since the wrapper is flagged as implements ToggleFoo, the instance can be cast to ToggleFoo and returned.

The class is generated at runtime using Apache BCEL. The compiled bytecode is created directly, without source code, and loaded using a custom ClassLoader.

When expose(Object, Class) is called (the Class object represents an interface), the library uses reflection to find the correct type to cast the object to. The same rules of visibility normally enforced by the compiler apply at the JVM level, so there has to be a public class/superclass that implements the methods in the interface.

subsetting interfaces

Another use is to convert from an interface type to a more restricted interface - a 'superinterface' that was never specified. For example:

public interface ImmutableList { 
   int size();
   Object get(int i);
}

ImmutableList lst = Bridge.expose(myArrayList, ImmutableList.class);

Now lst can be securely passed to client code without fear of modification.

performance comparison

Setup: When a type exposes a new interface, the bridge class must be generated and loaded. This is actually very fast (~5ms in my tests) but the first time requires BCEL to load, which took about 500ms. I haven't included this in tests below, it's important for short-running tasks.

Each object: calling Bridge.expose() allocates a small new wrapper object, and sets a field in it.

Each call: Calling a method through the interface requires an interface method call (slower than a normal virtual one), a field access, and a virtual method call to the target.

The common alternatives to a bytecode generation approach are direct access and reflection.

Direct access to the object is always going to be fastest, and it's important to know how much speed you're giving up for this decoupling, or security (in the case of subsetting an interface). There is no setup, each object just has to be cast to the correct type, and calling is just a virtual method call.

Reflection was optimised in Java 1.4, but is still carries a penalty. Class.getMethod() must be called for each new class of target. In my test all targets were in fact the same class (even the same instance) but the code did not assume it, so the fact that the class is the same has to be checked for each new object. You call the method with Method.invoke().

Timing results

TestDirectReflectionBridge
Invoke virtual method on same object68ms2772ms258ms
Invoke virtual method on different object70ms2808ms5621ms
Invoke method through interface on same object163ms3384ms237ms

edit: improved performance by 30% on second test by removing an extra object allocation. Allocating objects in a tight loop is still slow.

Testing was on Sun Java 1.5.0_11 i386 on Linux (Ubuntu 7.02), on my laptop (Athlon XP 1800+). source

code

Is here. If it breaks, you keep both pieces. You need bcel.jar on your -classpath, from here.

There's a new library to generate JVM bytecode from ruby. I'm not so hot on JRuby, but this looks cool for learning the bytecode.

0 comments: