|
The Java Specialists' Newsletter
Issue 068 2003-04-21
Category:
Performance
Java version: Appending Stringsby Dr. Heinz M. Kabutz
Welcome to the 68th edition of The Java(tm) Specialists' Newsletter, sent to 6400 Java
Specialists in 95 countries.
Since our last newsletter, we have had two famous Java authors
join the ranks of subscribers. It gives me great pleasure to
welcome Mark Grand and Bill Venners to our list of
subscribers.
Mark is famous for his three volumes of Java Design Patterns
books. You will notice that I quote Mark in the brochure
of my Design Patterns course. Bill is famous for his book
Inside The Java Virtual Machine.
Bill also does a lot of work training with Bruce Eckel.
Our last newsletter on BASIC Java
produced gasps of disbelief. Some readers
told me that they now wanted to unsubscribe, which of course I
supported 100%. Others enjoyed it with me. It was meant in
humour, as the warnings at the beginning of the newsletter clearly
indicated.
Thanks for reading this newsletter on our website. We also have a mailing list. That is where the real action takes place (webinars, free reports, etc.). Maybe subscribe today?
Advanced Java Courses on Crete:Java Specialists Master Course 18-21 June 2013 and
Concurrency Specialists Course 6-9 August 2013.
Appending Strings
The first code that I look for when I am asked to find out why
some code is slow is concatenation of Strings. When we concatenate
Strings with += a whole lot of objects are constructed.
Before we can look at an example, we need to define a Timer class
that we will use for measuring performance:
/**
* Class used to measure the time that a task takes to execute.
* The method "time" prints out how long it took and returns
* the time.
*/
public class Timer {
/**
* This method runs the Runnable and measures how long it takes
* @param r is the Runnable for the task that we want to measure
* @return the time it took to execute this task
*/
public static long time(Runnable r) {
long time = -System.currentTimeMillis();
r.run();
time += System.currentTimeMillis();
System.out.println("Took " + time + "ms");
return time;
}
}
In the test case, we have three tasks that we want to measure.
The first is a simple += String append, which turns out to be
extremely slow. The second creates a StringBuffer and calls
the append method of StringBuffer. The third method creates
the StringBuffer with the correct size and then appends to
that. After I have presented the code, I will explain what
happens and why.
public class StringAppendDiff {
public static void main(String[] args) {
System.out.println("String += 10000 additions");
Timer.time(new Runnable() {
public void run() {
String s = "";
for(int i = 0; i < 10000; i++) {
s += i;
}
// we have to use "s" in some way, otherwise a clever
// compiler would optimise it away. Not that I have
// any such compiler, but just in case ;-)
System.out.println("Length = " + s.length());
}
});
System.out.println(
"StringBuffer 300 * 10000 additions initial size wrong");
Timer.time(new Runnable() {
public void run() {
StringBuffer sb = new StringBuffer();
for(int i = 0; i < (300 * 10000); i++) {
sb.append(i);
}
String s = sb.toString();
System.out.println("Length = " + s.length());
}
});
System.out.println(
"StringBuffer 300 * 10000 additions initial size right");
Timer.time(new Runnable() {
public void run() {
StringBuffer sb = new StringBuffer(19888890);
for(int i = 0; i < (300 * 10000); i++) {
sb.append(i);
}
String s = sb.toString();
System.out.println("Length = " + s.length());
}
});
}
}
This program does use quite a bit of memory, so you should set
the maximum old generation heapspace to be quite large, for example
256mb. You can do that with the -Xmx256m flag.
When we run this program, we get the following output:
String += 10000 additions
Length = 38890
Took 2203ms
StringBuffer 300 * 10000 additions initial size wrong
Length = 19888890
Took 2254ms
StringBuffer 300 * 10000 additions initial size right
Length = 19888890
Took 1562ms
You can observe that using StringBuffer directly is
about 300 times faster than using +=.
Another observation that we can make is that if we set
the initial size to be correct, it only takes 1562ms
instead of 2254ms. This is because of the way that
java.lang.StringBuffer works. When you create a new
StringBuffer, it creates a char[] of size 16. When
you append, and there is no space left in the char[]
then it is doubled in size. This means that if you
size it first, you will reduce the number of char[]s
that are constructed.
The time that the += String append takes is dependent
on the compiler that you use to compile the code. I
discovered this accidentally during my Java course last
week, and much to my embarrassment, I did not know why
this was. If you compile it from within Eclipse, you get
the result above, and if you compile it with Sun's
javac, you get the output below. I think
that Eclipse uses jikes to compile the code, but I am not
sure. Perhaps it even has an internal compiler?
String += 10000 additions
Length = 38890
Took 7912ms
StringBuffer 300 * 10000 additions initial size wrong
Length = 19888890
Took 2634ms
StringBuffer 300 * 10000 additions initial size right
Length = 19888890
Took 1822ms
Why the difference between compilers?
This took some head-scratching, resulting in my fingers
being full of wood splinters. I started by writing a
class that did the basic String append with +=.
public class BasicStringAppend {
public BasicStringAppend() {
String s = "";
for(int i = 0; i < 100; i++) {
s += i;
}
}
}
When in doubt about what the compiler does, disassemble
the classes. Even when I disassembled them, it took a
while before I figured out what the difference was and
why it was important. The part where they differ is in
italics. You can disassemble a class with the
tool javap that is in the bin directory of
your java installation. Use the -c parameter:
javap -c BasicStringAppend
Compiled with Eclipse:
Compiled from BasicStringAppend.java
public class BasicStringAppend extends java.lang.Object {
public BasicStringAppend();
}
Method BasicStringAppend()
0 aload_0
1 invokespecial #9 <Method java.lang.Object()>
4 ldc #11 <String "">
6 astore_1
7 iconst_0
8 istore_2
9 goto 34
12 new #13 <Class java.lang.StringBuffer>
15 dup
16 aload_1
17 invokestatic #19 <Method java.lang.String valueOf(java.lang.Object)>
20 invokespecial #22 <Method java.lang.StringBuffer(java.lang.String)>
23 iload_2
24 invokevirtual #26 <Method java.lang.StringBuffer append(int)>
27 invokevirtual #30 <Method java.lang.String toString()>
30 astore_1
31 iinc 2 1
34 iload_2
35 bipush 100
37 if_icmplt 12
40 return
Compiled with Sun's javac:
Compiled from BasicStringAppend.java
public class BasicStringAppend extends java.lang.Object {
public BasicStringAppend();
}
Method BasicStringAppend()
0 aload_0
1 invokespecial #1 <Method java.lang.Object()>
4 ldc #2 <String "">
6 astore_1
7 iconst_0
8 istore_2
9 goto 34
12 new #3 <Class java.lang.StringBuffer>
15 dup
16 invokespecial #4 <Method java.lang.StringBuffer()>
19 aload_1
20 invokevirtual #5 <Method java.lang.StringBuffer append(java.lang.String)>
23 iload_2
24 invokevirtual #6 <Method java.lang.StringBuffer append(int)>
27 invokevirtual #7 <Method java.lang.String toString()>
30 astore_1
31 iinc 2 1
34 iload_2
35 bipush 100
37 if_icmplt 12
40 return
Instead of explaining what every line does (which I hope should not
be necessary on a Java Specialists' Newsletter) I present
the equivalent Java code for both IBM's Eclipse and Sun. The differences,
which equate to the disassembled difference, is again in italics:
public class IbmBasicStringAppend {
public IbmBasicStringAppend() {
String s = "";
for(int i = 0; i < 100; i++) {
s = new StringBuffer(String.valueOf(s)).append(i).toString();
}
}
}
public class SunBasicStringAppend {
public SunBasicStringAppend() {
String s = "";
for(int i = 0; i < 100; i++) {
s = new StringBuffer().append(s).append(i).toString();
}
}
}
It does not actually matter which compiler is better, either is terrible.
The answer is to avoid += with Strings wherever possible.
Throw the used StringBuffers away!
You should never reuse a StringBuffer object. Construct it, fill it,
convert it to a String, and then throw it away.
Why is this? StringBuffer contains a char[]
which holds the characters to be used for the String. When you call
toString() on the StringBuffer, does it make a copy of
the char[]? No, it assumes that you will
throw the StringBuffer away and constructs a String with a pointer to
the same char[] that is contained inside
StringBuffer! If you do change the StringBuffer after creating
a String, it makes a copy of the char[] and
uses that internally. Do yourself a favour and read the source code
of StringBuffer - it is enlightning.
But it gets worse than this. In JDK 1.4.1, Sun changed the way that
setLength() works. Before 1.4.1, it was safe to do the following:
... // StringBuffer sb defined somewhere else
sb.append(...);
sb.append(...);
sb.append(...);
String s = sb.toString();
sb.setLength(0);
The code of setLength pre-1.4.1 used to contain the following
snippet of code:
if (count < newLength) {
// *snip*
} else {
count = newLength;
if (shared) {
if (newLength > 0) {
copy();
} else {
// If newLength is zero, assume the StringBuffer is being
// stripped for reuse; Make new buffer of default size
value = new char[16];
shared = false;
}
}
}
It was replaced in the 1.4.1 version with:
if (count < newLength) {
// *snip*
} else {
count = newLength;
if (shared) copy();
}
Therefore, if you reuse a StringBuffer in JDK 1.4.1, and any one of the
Strings created with that StringBuffer is big,
all future Strings will have the same size char[]. This is not very
kind of Sun, since it causes bugs in many libraries. However, my argument
is that you should not have reused
StringBuffers anyway, since you will have less overhead simply creating
a new one than setting the size to zero again.
This memory leak was pointed out to me by Andrew Shearman during one
of my courses, thank you very much! For more information, you can
visit Sun's
website.
When you read those posts, it becomes apparent that JDOM reuses StringBuffers
extensively. It was probably a bit mean to change StringBuffer's setLength()
method, although I think that it is not a bug. It is simply highlighting
bugs in many libraries.
For those of you that use JDOM, I hope that JDOM will be fixed soon to cater
for this change in the JDK. For the rest of us, let us remember to throw away
used StringBuffers.
So long...
Heinz
Performance Articles
Related Java Course
Discuss at The Java Specialist Club
|