|
The Java Specialists' Newsletter
Issue 158 2008-04-01
Category:
Performance
Java version: 6 Polymorphism Performance Mysteries Explainedby Dr. Heinz M. KabutzAbstract:
In this newsletter, we reveal some of the polymorphism
mysteries in the JDK. The HotSpot Server Compiler can
distinguish between mono-morphism, bi-morphism and
poly-morphism. The bi-morphism is a special case which
executes faster than poly-morphism. Mono-morphism can be
inlined by the compiler in certain circumstances, thus not
costing anything at all.

A warm welcome to The Java(tm) Specialists' Newsletter, sent to you from the Island of
Crete in the Mediterranean. Last week I was in Estonia, a
delightful little country just south of Finland. The total
population is about 1.2 million people. For those Dilbert
lovers out there, Estonia is not Elbonia ... ;-) Therefore,
do not expect to find waist-high mud and pigs on phone lines.
Instead, you will find a booming little country with super
fast internet, plus if you are lucky, a bit of snow and ice.
It is a country that is worth visiting, especially now that
they are a member of the Schengen states, which simplifies
travel for people on odd passports (such as my South
African). We had several programmers from other cities and
countries on our Java Specialist Master Course.
Upcoming Java Specialist Master Courses:
- please click here to sign up.
As from May 2010, we are also offering this course on the island of Crete. We
only accept 6 students per class in Crete, due to the size of our conference
room. Please book early to avoid disappointment!
San Jose CA, Mar 16-19 2010, $3500 Ottawa, Canada, Mar 22-25 2010, $3500 Oslo, Norway, Apr 13-16 2010, Kr 24500 Montreal, Canada, Apr 20-23 2010, $3500 Toronto, Canada, May 17-20 2010, $3500 Chania, Crete, May 25-28, Jun 29-Jul 2 or Aug 24-27 2010, €2500
In-house courses if these dates or locations do not suit you - click here for more information. Polymorphism Performance Mysteries Explained
After our previous
newsletter, Cliff Click, the author of the
Server HotSpot VM, confirmed what we had observed by
experimentation. Before we go into the
details of mono-morphism, bi-morphism and poly-morphism, I
would like to show you some code that you can use today to
demonstrate to your friends and colleagues that
Java is "infinitely" faster than C++ / C# / Pascal / [their
favourite language here]. Infinitely faster, that is, at
doing nothing :-)
Let's take the following class FastJavaFast. It will call a
method 1000000000000000000000000000 times in under a
second. That is 10 to the power of 27. The method will not
just be any method, but an overridden method (like virtual
methods in C++). You need to start the program with the
-server flag in Java 6.
public class FastJavaFast {
public void test() {
for (int i = 0; i < 1000 * 1000 * 1000; i++) {
test2();
}
}
public void test2() {
for (int i = 0; i < 1000 * 1000 * 1000; i++) {
test3();
}
}
public void test3() {
for (int i = 0; i < 1000 * 1000 * 1000; i++) {
foo();
}
}
// this is a "virtual" method, in the C++ sense
public void foo() {
}
public static void main(String[] args) {
FastJavaFast fast = new FastJavaFast() {
// I am overriding the foo() method
public void foo() {
// do nothing
}
};
long time = System.currentTimeMillis();
fast.foo();
fast.test3();
fast.test2();
fast.test();
time = System.currentTimeMillis() - time;
System.out.println("time = " + time);
}
}
On my old Dell D800 notebook, I can execute the test in just
a couple of milliseconds! Calling an overridden method
1,000,000,000,000,000,000,000,000,000 in under a
second is simply blazingly fast! If we keep on calling
test(), it eventually executes in 0 milliseconds. Swallow
that one, C#! No other language can do nothing as
fast as Java :-)
Now for the explanation. The first time that a method is
run, the HotSpot Compiler tries to optimize the code. It
uses something called On-Stack-Replacement (OSR), where it
replaces some of the byte codes with optimized code. This is
not the best optimization, but a start. If you call the
method a second time, the newly optimized method is called.
Since we call the foo() method first, the HotSpot compiler
determines that there is only one implementation of the
method being called. When we call the test3() method, it
has already determined that foo() can be inlined, and it
eliminates the loop altogether. We then call test2(), where
it again has already determined that test3() does nothing,
thus eliminating the loop in test2(). Lastly, we call test()
and similarly, the loop is optimized out.
You can test this by inlining the calls to test2()
and test3(), thus having a method test() such as:
public void test() {
for (int i = 0; i < 1000 * 1000 * 1000; i++) {
for (int j = 0; j < 1000 * 1000 * 1000; j++) {
for (int k = 0; k < 1000 * 1000 * 1000; k++) {
foo();
}
}
}
}
I estimate this should take approximately 79,274,479,959
years to complete on my Dell ...
The amazing performance of inlining code should affect how we
code Java. We should stay clear of long methods and rather
break them up more. Lots of small methods usually result in
less duplicate copy and paste. Since the inlining is done
for you by the Server HotSpot Compiler, there is no cost in
calling the method.
This was one of the reasons for the strange results in my
previous newsletter,
where the initial results were faster than subsequent runs.
Instead of calling a test method many times in our main()
method, we should rather extract that code into a separate
method and call that from main().
Thanks to Cliff Click's input, I changed my test code to
show these effects. First off, instead of hard-coding the
instances, I generate a bunch of random B subclasses and put
them into an array. In my test, I then iterate over the
A instances and call the run() method. Instead of showing
you all the classes again, I will just show you one example
and refer you to my website for a zip file
containing all the sources.
We start with a common interface for all the tests, called
Test.java. The run() method calls B's foo() function, which
might be a subclass instance. The description() returns a
String showing what the test is doing.
public interface Test {
public void run();
public String description();
}
For example, here we have A3, B3, C3:
public class A3 implements Test {
private final B3 b;
public A3(B3 b) {
this.b = b;
}
public void run() {
b.f();
}
public String description() {
return "Bi-Morphic: Two subclasses, via interface";
}
}
public interface B3 {
public void f();
}
public class C3 implements B3 {
public void f() {
}
}
In this new test class, we choose the test at start-up with
a command-line argument. Depending on the test chosen, we
generate instances of different test cases. The command line
arguments are:
0 - No-morphic: Single subclass, pointed to directly
1 - Mono-Morphic: Single subclass, via interface
2 - Bi-Morphic: Two subclasses, via interface
3 - Poly-Morphic: Three subclasses, via interface
4 - Poly-Morphic: Four subclasses, via interface
5 - Poly-Morphic: Eight subclasses, via interface
We generate the test data by making instances of the relevant
classes (A1, A2, etc.) with the corresponding B* classes and
subclasses. For A4, the interface would be B4 and the
implementations of B4 would be C4, D4, E4. When we run the
test, we pick a random set of 10 instance, and then call the
run() method on those. In order to "warm-up" the HotSpot
Server Compiler, we run through all of the tests first, to
let the compiler settle. Once that is completed, we do the
actual runs. You can see from our results below that our
standard deviation is rather low.
import java.lang.reflect.Constructor;
import java.util.Random;
public class PolymorphismCliffRandomTest {
private static final int UPTO = 100 * 1000 * 1000;
public static void main(String[] args) throws Exception {
int offset = Integer.parseInt(args[0]);
Test[] tests = generateTestData(offset);
printDescriptions(tests);
System.out.println("Warmup");
testAll(tests);
System.out.println("Actual run");
printHeader(tests);
testAll(tests);
}
private static void testAll(Test[] tests) {
for (int j = 0; j < 10; j++) {
runTests(tests);
}
}
private static void printDescriptions(Test[] tests) {
System.out.println(tests[0].getClass().getSimpleName() +
": " + tests[0].description());
System.out.println();
}
public static void runTests(Test[] tests) {
long time = System.currentTimeMillis();
test(tests);
time = System.currentTimeMillis() - time;
System.out.print(time + "\t");
System.out.flush();
System.out.println();
}
public static void test(Test[] sources) {
Test t0 = makeRandomTest(sources);
Test t1 = makeRandomTest(sources);
Test t2 = makeRandomTest(sources);
Test t3 = makeRandomTest(sources);
Test t4 = makeRandomTest(sources);
Test t5 = makeRandomTest(sources);
Test t6 = makeRandomTest(sources);
Test t7 = makeRandomTest(sources);
Test t8 = makeRandomTest(sources);
Test t9 = makeRandomTest(sources);
for (int i = 0; i < UPTO / 10; i++) {
t0.run();
t1.run();
t2.run();
t3.run();
t4.run();
t5.run();
t6.run();
t7.run();
t8.run();
t9.run();
}
}
private static Test makeRandomTest(Test[] sources) {
return sources[((int) (Math.random() * sources.length))];
}
private static void printHeader(Test[] tests) {
System.out.print(tests[0].getClass().getSimpleName());
System.out.print('\t');
System.out.println();
}
private static Test[] generateTestData(int offset)
throws Exception {
switch (offset) {
default:
throw new IllegalArgumentException("offset:" + offset);
case 0:
return fillSources(A1.class, B1.class, B1.class);
case 1:
return fillSources(A2.class, B2.class, C2.class);
case 2:
return fillSources(
A3.class, B3.class, C3.class, D3.class);
case 3:
return fillSources(
A4.class, B4.class, C4.class, D4.class, E4.class);
case 4:
return fillSources(
A5.class, B5.class, C5.class, D5.class, E5.class,
F5.class);
case 5:
return fillSources(
A6.class, B6.class, C6.class, D6.class, E6.class,
F6.class, G6.class, I6.class, J6.class);
}
}
private static Test[] fillSources(
Class<? extends Test> aClass,
Class<?> bClass, Class<?>... bClasses)
throws Exception {
Test[] sources = new Test[1000];
Random rand = new Random(0);
Constructor<? extends Test> constr =
aClass.getConstructor(bClass);
for (int i = 0; i < sources.length; i++) {
int offset = Math.abs(rand.nextInt() % bClasses.length);
Object b = bClasses[offset].newInstance();
sources[i] = constr.newInstance(b);
}
return sources;
}
}
We run this six times, one for each test case. Here are the
test cases:
A1: No-morphic: Single subclass, pointed to directly
A2: Mono-Morphic: Single subclass, via interface
A3: Bi-Morphic: Two subclasses, via interface
A4: Poly-Morphic: Three subclasses, via interface
A5: Poly-Morphic: Four subclasses, via interface
A6: Poly-Morphic: Eight subclasses, via interface
On my server, the average results are:
A1 A2 A3 A4 A5 A6
0 (0) 0 (0) 495 (9) 2855 (31) 2838 (18) 2837 (10)
When you run it with Java 6 -server, A1 and A2 come back with
an average of 0 (0). The standard deviation is in brackets.
In the case of a simple pointer to B or where there is a
single subclass of B, the method call is inlined.
A3 is the bi-morphic case, where we have two subclasses of B.
In this case, the results come back as 495 (9). The Server
HotSpot compiler, according to Cliff Click, deals with
bi-morphism as a special case of poly-morphism: "Where the
server compiler can prove only two classes reach a
call site, it will insert a type-check and then statically
call both targets (which may then further inline, etc)."
The cases A4, A5 and A6 are all where we have more than
two subclasses of B. In these cases, the results come back
as 2855 (31), 2838 (18) and 2837 (10). They are pretty much
equal and much slower than the bi-morphic case. According to
Cliff Click: "HotSpot uses an inline-cache for calls where
the compiler cannot prove only a single target can be called.
An inline-cache turns a virtual (or interface) call into a
static call plus a few cycles of work. It's is a 1-entry
cache inlined in the code; the Key is the expected class of
the 'this' pointer, the Value is the static target method
matching the Key, directly encoded as a call instruction. As
soon as you need 2+ targets for the same call site, you
revert to the much more expensive dynamic lookup
(load/load/load/jump-register). Both compilers use the same
runtime infrastructure, but the server compiler is more
aggressive about proving a single target."
It was nice having my discoveries confirmed by the guy who
wrote the JVM Server Hotspot Compiler :-) Thanks to Cliff
Click for taking the time to explain how it works. We also
saw that Java is really fast at doing nothing, so be extra
careful with empty loops that you inserted into your code
instead of Thread.sleep(). Since Java 6, they are in danger
of being removed.
We see again how good design wins: short methods, lots of
polymorphism. If we do not need the polymorphism, it is
optimized out of our code. Thus the penalty for using it
is extremely small, usually non-existant where it counts.
I am in the process of expanding my Java Design Patterns
Course to include more patterns. This has been one of my
most popular courses to date. If this would interest you,
please let me know
:-).
Kind regards
Heinz
P.S. In my last newsletter, I mentioned that we were trying
to buy an olive grove on which to build our family house. We
managed to finally buy it yesterday. If you can believe this,
the lawyers are on strike in Greece at the moment. Even the
kids at school strike! They say that you have not
experienced true Greece if you have not been inconvenienced
by at least one strike whilst on holiday here ;-)
Performance Articles
Related Java Course
|