|
The Java Specialists' Newsletter
Issue 179 2009-12-30
Category:
Performance
Java version: 5+ Escape Analysisby Dr. Heinz M. KabutzAbstract:
Escape analysis can make your code run 110 times faster -
if you are a really really bad programmer to begin with :-)
In this newsletter we look at some of the places where
escape analysis can potentially help us.
A hearty welcome to the 179th edition of The Java(tm) Specialists' Newsletter, sent to you
from the beautiful warm and sunny island of Crete. Whilst
my wife and oldest daughter are spending a week shivering in
England, I'm left playing mom and dad back in Crete with our
other two. Rather challenging finding time to research the
topic for this month amongst cooking and various domestic
tasks. If I did this job permanently I'd lose 20 pounds in
the first month.
So, before we end 2009, here is one more little newsletter
for all the diehard Java programmers who are still sitting
behind their desks despite the many holidays.
Upcoming Java Specialist Master Courses:
- please click here to sign up.
As from May 2010, we are also offering this course on the island of Crete. We
only accept 6 students per class in Crete, due to the size of our conference
room. Please book early to avoid disappointment!
San Jose CA, Mar 16-19 2010, $3500 Ottawa, Canada, Mar 22-25 2010, $3500 Oslo, Norway, Apr 13-16 2010, Kr 24500 Montreal, Canada, Apr 20-23 2010, $3500 Toronto, Canada, May 17-20 2010, $3500 Chania, Crete, May 25-28, Jun 29-Jul 2 or Aug 24-27 2010, €2500
In-house courses if these dates or locations do not suit you - click here for more information. Escape Analysis
Escape analysis has been hailed as a solution to GC problems
for the last few years. Here are some articles written that
explain what it is and why it can help us.
Sun Microsystems, 2009:
Java SE 6u14 Update Release Notes
From the release notes:
The -XX:+DoEscapeAnalysis option directs HotSpot to look
for objects that are created and referenced by a single
thread within the scope of a method compilation.
Allocation is omitted for such non-escaping objects, and
their fields are treated as local variables, often
residing in machine registers. Synchronization on
non-escaping objects is also elided.
Dan Dyer, 2009:
Escape
Analysis in Java 6 Update 14 - Some Informal Benchmarks
Wolfgang Laun, 2009:
Pointed me to the latest flags in update 14 in a private
email.
Tinou Bao, 2009:
Lock Coarsening, Biased Locking, Escape Analysis for Dummies
(my favourite article on the subject)
Jeroen Borgers, 2007:
Did
escape analysis escape from Java 6?
Brian Goetz, 2005:
Java
theory and practice: Urban performance legends, revisited
It is easy to be confused by the results of the escape
analysis flag since it does two things. It can
omit constructing objects that do not escape, even keeping
them in CPU registers. However, it also avoids
synchronization on non-escaping objects. This can
skew the results of microbenchmarks to make it seem that
escape analysis is better than it really is.
One way to tell the difference is to log the GC output. If a
benchmark runs faster with escape analysis turned on and
the GC output (-Xloggc:file.gc) is the same, it is most
likely the result of lock eliding.
Writing a benchmark to specifically test escape analysis is
rather difficult. After more than a dozen dead ends, I came
up with this one whilst taking my 3 year old for a walk
around the countryside this morning. Escape analysis seems
to give us the best performance gains with poorly written
code, such as when we create lots of unnecessary objects.
For example, here is a Calculator, that adds two ints
together. Even though the method add() could be static, we
wrote it as non-static to demonstrate the power of escape
analysis.
public class Calculator {
public int add(int i0, int i1) {
return i0 + i1;
}
}
In our poorly written CalculatorTest, we construct a new
Calculator object every time we call the add method. You
would hopefully agree that this is a harebrained way of
coding Java. We use the return value of the calculation
to ensure that the entire methods are not optimized away.
public class CalculatorTest {
public static void main(String[] args) {
long time = System.currentTimeMillis();
long grandTotal = 0;
for (int i = 0; i < 100000; i++) {
grandTotal += test();
}
time = System.currentTimeMillis() - time;
System.out.println("time = " + time + "ms");
System.out.println("grandTotal = " + grandTotal);
}
private static long test() {
long total = 0;
for (int i = 0; i < 10000; i++) {
Calculator calc = new Calculator();
total += calc.add(i, i/2);
}
return total;
}
}
I ran this with an old Java 1.6.0_03 32-bit server JVM on my
Mac (Soylatte)
and the latest 1.6.0_17 64-bit server JVM. Escape analysis
has only been officially available since 1.6.0_14, so I
could not use it for Soylatte.
EA on EA off Old Java
2.2s 8.6s 7.0s
With escape analysis turned off, we constructed 15 GB of
heap objects. Even though GC was only 2% of CPU, we did
create 1.3 GB per second. We call this object churn.
The 32-bit Soylatte JVM constructed 7.6 GB of objects. Since
the objects are half the size of the 64-bit objects, 8 bytes
as opposed to 16 bytes, these values make sense. I tried to
compress the OOPS using the new -XX:+UseCompressedOops flag,
but that did not seem to make much difference. The objects
are still each 16 bytes, according to the GC logs.
As I said before, escape analysis helps us to improve
performance of poorly written code. Instead of taking
8.6 seconds, we only take 2.2 seconds. However, if we change
our code to reuse the Calculator object, or even make the add
method static, then the test executes in under a second.
An interesting application of escape analysis is with
varargs. In one of my many experiments, I found that array
objects only benefit from escape analysis when the size is
64 or less. So if you write really really bad code with more
than 64 arguments for a varargs call, then your program will
slow down to a crawl. Here is some sample code:
public class VarArgsTest {
public static void main(String[] args) {
long time = System.currentTimeMillis();
long grandTotal = 0;
for (int i = 0; i < 100000; i++) {
grandTotal += test();
}
time = System.currentTimeMillis() - time;
System.out.println("time = " + time + "ms");
System.out.println("grandTotal = " + grandTotal);
}
private static long test() {
long total = 0;
for (int i = 0; i < 10000; i++) {
total += test(
i, 1, 2, 3, 4, 5, 6, 7, 8, 9,
i, 1, 2, 3, 4, 5, 6, 7, 8, 9,
i, 1, 2, 3, 4, 5, 6, 7, 8, 9,
i, 1, 2, 3, 4, 5, 6, 7, 8, 9,
i, 1, 2, 3, 4, 5, 6, 7, 8, 9,
i, 1, 2, 3, 4, 5, 6, 7, 8, 9,
i, 1, 2, 3
);
}
return total;
}
public static int test(int... args) {
return args[0] + args.length;
}
}
When we run this with 64 parameters, we get the following
results, where escape analysis just made our code 110x
faster:
EA on EA off Old Java
1.2s 140s 120s
Since escape analysis only seems to work with arrays of
length 64 or less, if we add a single parameter to the method
call above, it slows down to a crawl:
EA on EA off Old Java
150s 150s 120s
We will probably never write such bad code, taking 65
parameters. However, knowing that there is special treatment
for arrays of length 64 or less means that we need to take
that into account when writing our benchmarks. For example,
adding three Strings together using a StringBuilder or
StringBuffer is only sensible if the total length can
sometimes exceed 64 characters.
Let's apply the knowledge of vararg improvements into a
better Calculator, now called CalculatorVarArgs:
public class CalculatorVarArgs {
public int add(int... is) {
if (is.length == 0) throw new IllegalArgumentException();
if (is.length == 1) return is[0];
if (is.length == 2) return is[0] + is[1];
if (is.length == 3) return is[0] + is[1] + is[2];
if (is.length == 4) return is[0] + is[1] + is[2] + is[3];
int total = 0;
for (int i : is) {
total += i;
}
return total;
}
}
Note the rather convoluted syntax for dealing with cases
where the length of the array is less than 5. I would've
thought that the loops would be unrolled automatically.
It does seem to make a rather large performance difference,
so look at this if a vararg method is your bottleneck.
With a small test, we see that the varargs also does not
create objects on the Java Heap as long as the array is small
enough:
public class CalculatorVarargTest {
public static void main(String[] args) {
long time = System.currentTimeMillis();
long grandTotal = 0;
for (int i = 0; i < 100000; i++) {
grandTotal += test();
}
time = System.currentTimeMillis() - time;
System.out.println("time = " + time + "ms");
System.out.println("grandTotal = " + grandTotal);
}
private static long test() {
long total = 0;
for (int i = 0; i < 10000; i++) {
CalculatorVarArgs calc = new CalculatorVarArgs();
total += calc.add(i, i/2);
}
return total;
}
}
As you can see in our results, the varargs makes a difference
when escape analysis is turned off, but otherwise not.
EA on EA off Old Java
2.2s 23s 18s
Java SciMark 2.0
It's quite interesting running the
scimark
benchmark using the various Escape Analysis settings.
The biggest difference is with the Monte Carlo calculation.
On my laptop, it runs in 410 with EA on and 300 with EA off.
However, there are hardly any objects collected, so my
suspicion is that performance improvement is coming from
synchronization eliding.
Real-Life Escape Analysis Improvements
Just because some microbenchmarks run 110x faster, does not
mean that our real application code is going to perform
better, unless we are really bad programmers.
On the other hand, perhaps now with escape analysis and lock
eliding in place, we have lowered the bar of what a good
programmer should be able to figure out? For example, as
Jeroen Borges correctly points out in his article,
StringBuilder is now obsolete, after just one Java version.
The benefit we had with StringBuilder was as an
unsynchronized version of StringBuffer. However, with lock
eliding we should not need to concern ourselves with this
anymore. Even Vector might become fashionable again.
Non-Escaping Object Storage
The Java release notes indicate that non-escaping objects are
treated as local variables, maybe even being stored in
machine registers. However, I have not managed to write a
benchmark that demonstates this. We would expect that if we
use escape analysis that we would run out of stack space more
quickly. However, that does not seem to be the case, even if
we have larger objects, such as int[64]. It would be
interesting to see a benchmark that shows how or where we
can run out of resources differently with escape analysis
enabled. A job for another day ...
My son asked me what type of weather we can expect tomorrow
in Crete, considering that winter has officially started.
Looks like 26 Celsius and sunny. Just thought you'd like to
know that information :-)))
Kind regards
Heinz
Performance Articles
Related Java Course
|