|
The Java Specialists' Newsletter
Issue 157 2008-03-06
Category:
Performance
Java version: 5+ Polymorphism Performance Mysteriesby Dr. Heinz M. KabutzAbstract:
Late binding is supposed to be a bottleneck in
applications - this was one of the criticisms of
early Java. The HotSpot Compiler is however rather good at
inlining methods that are being called through
polymorphism, provided that we do not have very many
implementation subclasses.
A warm welcome to the 157th issue of The Java(tm) Specialists' Newsletter sent to you from
the beautiful island of Crete. We signed a purchase agreement
on an olive grove on which we intend building our family
home. Most of the 100 olive trees should survive our
building project, which leaves me in a tricky situation. To harvest the olives
will take a fair amount of manpower. The trees should
produce a few hundred litres of fine pure cold pressed virgin
olive oil, but this would require quite a bit of manual
labour. Perhaps we should do a Java work camp where we
harvest olives during the day and Java at night? Any
volunteers? ;-)
Upcoming Java Specialist Master Courses:
"This course embodies my Java knowledge and experience gained publishing 180 advanced Java newsletters, teaching hundreds of seminars and writing hundreds of thousands of lines of Java code." Heinz Kabutz, The Java Specialists NewsletterParis, France, Feb 9-12 2010, €2500 - click to sign up. Düsseldorf, Germany (in German), Mar 2-5 2010, €2500 - click to sign up. San Jose CA, Mar 16-19 2010, $3500 - click to sign up. Oslo, Norway, Apr 13-16 2010, Kr 24500 - click to sign up. Chania, Crete, May 25-28 2010, €2500 - click to sign up.
In-house courses if these dates or locations do not suit you - click here for more information. Polymorphism Performance Mysteries
A few years ago, when I was still teaching Bruce Eckel's
excellent Handson Java course in South Africa, I wrote some
code to test the effects of polymorphism on performance. We
know that late binding is supposed to be more expensive than
static binding.
Polymorphism is essential to good design. In my Design Patterns Course
we look at how polymorphism features even in the Singleton
pattern. So is there a cost involved and if so, how much?
For microbenchmarks, I find it quite pointless to execute
them on the client HotSpot compiler. Almost all of our
serious Java programs will be executing on servers, so we
might as well concentrate on the server HotSpot compiler.
Let's start with classes A1 and B1. A1 contains a field
pointing to B1. In the run() method of A1, we call the
f() method in B1. We would expect the HotSpot compiler to
inline the method call, thus making the test exceedingly
fast. The method f() is not final,
but it should still be
able to determine that it is not being overwritten in any
subclasses, thus allowing it to be inlined.
public class A1 {
private final B1 b;
public A1(B1 b) {
this.b = b;
}
public void run() {
b.f();
}
}
public class B1 {
public void f() { }
}
The second set of classes is A2, B2 and C2. B2 is an
interface with C2 the actual
implementation. We would expect that the call to b.f() would
take longer than in our
first experiment, since we would expect to see late binding
thanks to polymorphism.
public class A2 {
private final B2 b;
public A2(B2 b) {
this.b = b;
}
public void run() {
b.f();
}
}
public interface B2 {
public void f();
}
public class C2 implements B2 {
public void f() {
}
}
The third set of classes are A3, B3, C3 and D3. B3 is again
an interface but this time with two
implementations, C3 and D3.
public class A3 {
private final B3 b;
public A3(B3 b) {
this.b = b;
}
public void run() {
b.f();
}
}
public interface B3 {
public void f();
}
public class C3 implements B3 {
public void f() {
}
}
public class D3 implements B3 {
public void f() {
}
}
The last set of classes are similar to A3, but with two
additional implementation classes. Say hello to: A4, B4, C4,
D4, E4 and F4.
public class A4 {
private final B4 b;
public A4(B4 b) {
this.b = b;
}
public void run() {
b.f();
}
}
public interface B4 {
public void f();
}
public class C4 implements B4 {
public void f() {
}
}
public class D4 implements B4 {
public void f() {
}
}
public class E4 implements B4 {
public void f() {
}
}
public class F4 implements B4 {
public void f() {
}
}
You might be wondering where I am going with this. Once we
have these classes with the polymorphic call to f(), we can
see if there is any difference at all in the performance of
run().
public class PolymorphismTest {
private static final int UPTO = 1000 * 1000 * 1000;
public static void main(String[] args) {
System.out.println("Empty\tA1\tA2\tA3\tA3/C+D\tA4");
for (int j = 0; j < 10; j++) {
long time = System.currentTimeMillis();
for (int i = 0; i < UPTO; i++) {
//empty loop
}
time = System.currentTimeMillis() - time;
System.out.print(time + "\t");
System.out.flush();
A1 a = new A1(new B1());
time = System.currentTimeMillis();
for (int i = 0; i < UPTO; i++) {
a.run();
}
time = System.currentTimeMillis() - time;
System.out.print(time + "\t");
System.out.flush();
A2 a2 = new A2(new C2());
time = System.currentTimeMillis();
for (int i = 0; i < UPTO; i++) {
a2.run();
}
time = System.currentTimeMillis() - time;
System.out.print(time + "\t");
System.out.flush();
A3 a3 = new A3(new C3());
time = System.currentTimeMillis();
for (int i = 0; i < UPTO; i++) {
a3.run();
}
time = System.currentTimeMillis() - time;
System.out.print(time + "\t");
System.out.flush();
A3 a3_c3 = new A3(new C3());
A3 a3_d3 = new A3(new D3());
time = System.currentTimeMillis();
for (int i = 0, upto=UPTO/2; i < upto; i++) {
a3_c3.run();
a3_d3.run();
}
time = System.currentTimeMillis() - time;
System.out.print(time + "\t");
System.out.flush();
A4 a4_c4 = new A4(new C4());
A4 a4_d4 = new A4(new D4());
A4 a4_e4 = new A4(new E4());
A4 a4_f4 = new A4(new F4());
time = System.currentTimeMillis();
for (int i = 0, upto=UPTO/4; i < upto; i++) {
a4_c4.run();
a4_d4.run();
a4_e4.run();
a4_f4.run();
}
time = System.currentTimeMillis() - time;
System.out.println(time);
}
}
}
Before I show you the results, I need to warn you that the
actual time to call the method is really small, irrespective
of whether we use polymorphism or not. Good design will
produce consistently faster code. So please do not go
changing your code to remove polymorphism, but rather look at
the results and try to figure out what is going on.
For the JDK 1.6.0_05 Server HotSpot, I got the following
results. The numbers in brackets is the standard deviation.
Java Version 1.6.0_05, Server VM
Empty A1 A2 A3 A3/C+D A4
0(0) 1126(1) 1691(6) 2255(9) 1691(7) 19469(28)
The empty loop has probably been eliminated by the
HotSpot compiler. The call to f() in A1 has probably been
inlined. Calls A2 and A3/C+D are virtually the same, whilst
A3 takes longer. A4 looks like it has been optimized very
little. It seems like the more active subclass objects we
have, the worse the polymorphism performance.
For completeness, we should also look at the Client HotSpot
compiler, plus the latest JDK 1.5.0_15.
Java Version 1.6.0_05, Client VM
Empty A1 A2 A3 A3/C+D A4
1694(10) 11299(482) 9031(6) 31583(34) 30457(28) 29896(24)
Java Version 1.5.0_15, Server VM
Empty A1 A2 A3 A3/C+D A4
0(0) 1126(1) 2258(9) 3680(34) 3384(10) 13279(914)
Java Version 1.5.0_15, Client VM
Empty A1 A2 A3 A3/C+D A4
2259(10) 10311(429) 10885(275) 19659(99) 19107(124) 18511(287)
I do not have a conclusive explanation for some of these
values (hence the title of this newsletter).
There is another mystery that I have not mentioned yet.
Usually when you run a benchmark, you discard the first set
of values, since they usually are much slower than the rest.
In this case, the opposite happens. Look at the first three
rows of raw data in running the test with the 1.6.0_05 Server
HotSpot compiler:
Empty A1 A2 A3 A3/C+D A4
9 9 97 13 1151 19575
0 1125 1692 2268 1688 19524
0 1125 1687 2251 1690 19489
If you run just A1, A2 and A3 repeatedly as tests, they very
quickly deteriorate in performance, so the method call is
then not completely eliminated anymore. This is the first
time that I have found an example where the HotSpot compiler
slows down after warming up. Perhaps it recognizes that
there may be other side effects with the additional classes
being added? I do not know, but I do find the results
curious.
I need to emphasize this once more, just to ensure that I
minimize misunderstandings: Good design beats
minute performance gains. Please do not go changing
production code! The one place where this knowledge can be
useful is when you are writing benchmarking code, just so you
are aware of possible causes of interference in your test.
Final Methods
Making the f() methods final in the implementation classes
did not change my results at all. In JDK 1.0 and 1.1, the
final methods were inlined by the static compiler, whereas in
JDK 1.2 onwards, they are inlined by the dynamic HotSpot
compiler. I only ever make methods final
for design reasons.
"Cost of Casting" Scapegoat for Polymorphism Cost?
The first time that I presented my Java Specialist Master
Course last year, we had a discussion as to
whether casting of objects is expensive. One of my students
vaguely remembered an example where by reducing the hierarchy
depth, they managed to significantly improve the performance.
Unfortunately he could not remember any details, so we were
not able to reproduce this scenario. However, I have for a
while now had the suspicion that some of the alleged cost of
casting might have been the cost of late binding?
Kind regards
Heinz
Performance Articles
Related Java Course
|