Java Specialists' Java Training Europehome of the java specialists' newsletter

The Java Specialists' Newsletter
Issue 1572008-03-06 Category: Performance Java version: 5+

GitHub Subscribe Free RSS Feed

Polymorphism Performance Mysteries

by Dr. Heinz M. Kabutz
Abstract:
Late binding is supposed to be a bottleneck in applications - this was one of the criticisms of early Java. The HotSpot Compiler is however rather good at inlining methods that are being called through polymorphism, provided that we do not have very many implementation subclasses.

A warm welcome to the 157th issue of The Java(tm) Specialists' Newsletter sent to you from the beautiful island of Crete. We signed a purchase agreement on an olive grove on which we intend building our family home. Most of the 100 olive trees should survive our building project, which leaves me in a tricky situation. To harvest the olives will take a fair amount of manpower. The trees should produce a few hundred litres of fine pure cold pressed virgin olive oil, but this would require quite a bit of manual labour. Perhaps we should do a Java work camp where we harvest olives during the day and Java at night? Any volunteers? ;-)

NEW: Please see our new "Extreme Java" course, combining concurrency, a little bit of performance and Java 8. Extreme Java - Concurrency & Performance for Java 8.

Polymorphism Performance Mysteries

A few years ago, when I was still teaching Bruce Eckel's excellent Handson Java course in South Africa, I wrote some code to test the effects of polymorphism on performance. We know that late binding is supposed to be more expensive than static binding.

Polymorphism is essential to good design. In my Design Patterns Course we look at how polymorphism features even in the Singleton pattern. So is there a cost involved and if so, how much?

For microbenchmarks, I find it quite pointless to execute them on the client HotSpot compiler. Almost all of our serious Java programs will be executing on servers, so we might as well concentrate on the server HotSpot compiler.

Let's start with classes A1 and B1. A1 contains a field pointing to B1. In the run() method of A1, we call the f() method in B1. We would expect the HotSpot compiler to inline the method call, thus making the test exceedingly fast. The method f() is not final, but it should still be able to determine that it is not being overwritten in any subclasses, thus allowing it to be inlined.

public class A1 {
  private final B1 b;
  public A1(B1 b) {
    this.b = b;
  }
  public void run() {
    b.f();
  }
}

public class B1 {
  public void f() { }
}
  

The second set of classes is A2, B2 and C2. B2 is an interface with C2 the actual implementation. We would expect that the call to b.f() would take longer than in our first experiment, since we would expect to see late binding thanks to polymorphism.

public class A2 {
  private final B2 b;
  public A2(B2 b) {
    this.b = b;
  }
  public void run() {
    b.f();
  }
}

public interface B2 {
  public void f();
}
    
public class C2 implements B2 {
  public void f() {
  }
}
  

The third set of classes are A3, B3, C3 and D3. B3 is again an interface but this time with two implementations, C3 and D3.

public class A3 {
  private final B3 b;
  public A3(B3 b) {
    this.b = b;
  }
  public void run() {
    b.f();
  }
}

public interface B3 {
  public void f();
}

public class C3 implements B3 {
  public void f() {
  }
}

public class D3 implements B3 {
  public void f() {
  }
}
  

The last set of classes are similar to A3, but with two additional implementation classes. Say hello to: A4, B4, C4, D4, E4 and F4.

public class A4 {
  private final B4 b;

  public A4(B4 b) {
    this.b = b;
  }

  public void run() {
    b.f();
  }
}
public interface B4 {
  public void f();
}
public class C4 implements B4 {
  public void f() {
  }
}
public class D4 implements B4 {
  public void f() {
  }
}
public class E4 implements B4 {
  public void f() {
  }
}
public class F4 implements B4 {
  public void f() {
  }
}
    

You might be wondering where I am going with this. Once we have these classes with the polymorphic call to f(), we can see if there is any difference at all in the performance of run().

public class PolymorphismTest {
  private static final int UPTO = 1000 * 1000 * 1000;

  public static void main(String[] args) {

    System.out.println("Empty\tA1\tA2\tA3\tA3/C+D\tA4");
    for (int j = 0; j < 10; j++) {

      long time = System.currentTimeMillis();
      for (int i = 0; i < UPTO; i++) {
        //empty loop
      }
      time = System.currentTimeMillis() - time;
      System.out.print(time + "\t");
      System.out.flush();

      A1 a = new A1(new B1());
      time = System.currentTimeMillis();
      for (int i = 0; i < UPTO; i++) {
        a.run();
      }
      time = System.currentTimeMillis() - time;
      System.out.print(time + "\t");
      System.out.flush();

      A2 a2 = new A2(new C2());
      time = System.currentTimeMillis();
      for (int i = 0; i < UPTO; i++) {
        a2.run();
      }
      time = System.currentTimeMillis() - time;
      System.out.print(time + "\t");
      System.out.flush();

      A3 a3 = new A3(new C3());
      time = System.currentTimeMillis();
      for (int i = 0; i < UPTO; i++) {
        a3.run();
      }
      time = System.currentTimeMillis() - time;
      System.out.print(time + "\t");
      System.out.flush();

      A3 a3_c3 = new A3(new C3());
      A3 a3_d3 = new A3(new D3());
      time = System.currentTimeMillis();
      for (int i = 0, upto=UPTO/2; i < upto; i++) {
        a3_c3.run();
        a3_d3.run();
      }
      time = System.currentTimeMillis() - time;
      System.out.print(time + "\t");
      System.out.flush();

      A4 a4_c4 = new A4(new C4());
      A4 a4_d4 = new A4(new D4());
      A4 a4_e4 = new A4(new E4());
      A4 a4_f4 = new A4(new F4());
      time = System.currentTimeMillis();
      for (int i = 0, upto=UPTO/4; i < upto; i++) {
        a4_c4.run();
        a4_d4.run();
        a4_e4.run();
        a4_f4.run();
      }
      time = System.currentTimeMillis() - time;
      System.out.println(time);
    }
  }
}
  

Before I show you the results, I need to warn you that the actual time to call the method is really small, irrespective of whether we use polymorphism or not. Good design will produce consistently faster code. So please do not go changing your code to remove polymorphism, but rather look at the results and try to figure out what is going on.

For the JDK 1.6.0_05 Server HotSpot, I got the following results. The numbers in brackets is the standard deviation.

Java Version 1.6.0_05, Server VM
Empty  A1       A2       A3       A3/C+D   A4
0(0)   1126(1)  1691(6)  2255(9)  1691(7)  19469(28)
  

The empty loop has probably been eliminated by the HotSpot compiler. The call to f() in A1 has probably been inlined. Calls A2 and A3/C+D are virtually the same, whilst A3 takes longer. A4 looks like it has been optimized very little. It seems like the more active subclass objects we have, the worse the polymorphism performance.

For completeness, we should also look at the Client HotSpot compiler, plus the latest JDK 1.5.0_15.

Java Version 1.6.0_05, Client VM
Empty     A1          A2       A3         A3/C+D     A4
1694(10)  11299(482)  9031(6)  31583(34)  30457(28)  29896(24)

Java Version 1.5.0_15, Server VM
Empty  A1       A2       A3        A3/C+D    A4
0(0)   1126(1)  2258(9)  3680(34)  3384(10)  13279(914)
    
Java Version 1.5.0_15, Client VM
Empty     A1          A2         A3        A3/C+D      A4
2259(10)  10311(429)  10885(275) 19659(99) 19107(124)  18511(287)
  

I do not have a conclusive explanation for some of these values (hence the title of this newsletter).

There is another mystery that I have not mentioned yet. Usually when you run a benchmark, you discard the first set of values, since they usually are much slower than the rest. In this case, the opposite happens. Look at the first three rows of raw data in running the test with the 1.6.0_05 Server HotSpot compiler:

Empty  A1    A2    A3    A3/C+D  A4
9      9     97    13    1151    19575
0      1125  1692  2268  1688    19524
0      1125  1687  2251  1690    19489
  

If you run just A1, A2 and A3 repeatedly as tests, they very quickly deteriorate in performance, so the method call is then not completely eliminated anymore. This is the first time that I have found an example where the HotSpot compiler slows down after warming up. Perhaps it recognizes that there may be other side effects with the additional classes being added? I do not know, but I do find the results curious.

I need to emphasize this once more, just to ensure that I minimize misunderstandings: Good design beats minute performance gains. Please do not go changing production code! The one place where this knowledge can be useful is when you are writing benchmarking code, just so you are aware of possible causes of interference in your test.

Final Methods

Making the f() methods final in the implementation classes did not change my results at all. In JDK 1.0 and 1.1, the final methods were inlined by the static compiler, whereas in JDK 1.2 onwards, they are inlined by the dynamic HotSpot compiler. I only ever make methods final for design reasons.

"Cost of Casting" Scapegoat for Polymorphism Cost?

The first time that I presented my Java Specialist Master Course last year, we had a discussion as to whether casting of objects is expensive. One of my students vaguely remembered an example where by reducing the hierarchy depth, they managed to significantly improve the performance. Unfortunately he could not remember any details, so we were not able to reproduce this scenario. However, I have for a while now had the suspicion that some of the alleged cost of casting might have been the cost of late binding?

Kind regards

Heinz

Performance Articles Related Java Course

Java Master
Java Concurrency
Design Patterns
In-House Courses



© 2010-2014 Heinz Kabutz - All Rights Reserved Sitemap
Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. JavaSpecialists.eu is not connected to Oracle, Inc. and is not sponsored by Oracle, Inc.
@CORE_THE_BAND #RBBJGR