|
The Java Specialists' Newsletter
Issue 164 2008-09-29
Category:
Performance
Java version: Java 6 Why 0x61c88647?by Dr. Heinz M. KabutzAbstract:
Prior to Java 1.4, ThreadLocals caused thread contention,
rendering them useless for performant code. In the new
design, each thread contains its own ThreadLocalMap, thus
improving throughput. However, we still face the possibility
of memory leaks due to values not being cleared out of the
ThreadLocalMap with long running threads.
Welcome to the 164th issue of The Java(tm) Specialists' Newsletter, read in at least
116 countries,
with the recent addition of Tunisia. Our kids are back at
school, after 14 weeks of summer holidays. When I was young,
we used to get 6 weeks, of which a substantial amount of time was
spent getting ready for Christmas and surviving New Year. In
South Africa, Christmas is the height of summer :-) Back at
school, Connie promptly caught a nasty flu virus, which I
hope to be able to ward off with heavy doses of vitamins and
zinc.
We were approached a few months ago to present our
Design Patterns
Course to a rather large IT company. Since it would
be a larger contract, we agreed to completely overhaul our
Design Patterns material. J2EE patterns were tossed out
(at long last, I never liked them). In their place, we have
some additional Gang-of-Four patterns.
We also changed the structure of the exercises so that we can
now accommodate up to 50 developers in one class! It was
quite an experience teaching such a large class in one room.
Upcoming Java Specialist Master Courses:
- please click here to sign up.
As from May 2010, we are also offering this course on the island of Crete. We
only accept 6 students per class in Crete, due to the size of our conference
room. Please book early to avoid disappointment!
San Jose CA, Mar 16-19 2010, $3500 Ottawa, Canada, Mar 22-25 2010, $3500 Oslo, Norway, Apr 13-16 2010, Kr 24500 Montreal, Canada, Apr 20-23 2010, $3500 Toronto, Canada, May 17-20 2010, $3500 Chania, Crete, May 25-28, Jun 29-Jul 2 or Aug 24-27 2010, €2500
In-house courses if these dates or locations do not suit you - click here for more information. Why 0x61c88647?
This newsletter is a product of widespread collaboration.
It started with an email from Ant Kutschera, pointing out an
issue
he had noticed with using ThreadLocals.
I had
noticed during one of my Java Specialist Master
Course demonstrations that it was not always that
obvious when values were released from memory with ThreadLocals, which
could lead to memory leaks in applications that reuse threads.
(This newsletter could have been entitled "When Do
ThreadLocal Values Get Cleared?", but I wanted to pique your
interest with that number.)
However, the code for ThreadLocal looked a tad complicated,
with some rather intimidating numbers (0x61c88647), so I
enlisted the help of Joachim
Ansorg, a clever young German software engineer fresh
out of university and my long-time buddy John Green.
In earlier versions of Java, ThreadLocals used to have
contention when accessed from multiple threads, rendering
them practically useless for multi-core applications. In
Java 1.4, a new design was introduced, where the ThreadLocals
were stored in the Thread directly. When we now call get() on
the ThreadLocal, a call back is made to Thread, which returns
an instance of ThreadLocalMap, an inner class of ThreadLocal.
I discovered by experimentation that when a thread exits, it
also removes all of its ThreadLocal values. This happens
before it gets garbage collected, in its exit() method. If
we thus forget to call remove() on a ThreadLocal when we are
done with it, the value will still be made eligible for
garbage collection when the thread exits.
The ThreadLocalMap contains WeakReferences to the
ThreadLocals and normal strong references to the values.
However, it does not consult the ReferenceQueue to discover
which weak references have been cleared, thus an entry might
not be cleared out of the ThreadLocalMap immediately (or at
all, as we shall see.)
Before we delve into the code and try to figure out how
ThreadLocalMap works, I would like to demonstrate a simple
example of how ThreadLocals could be used. Imagine we had a
StupidInhouseFramework, where the author called an abstract
method from the constructor. Something like this:
public abstract class StupidInhouseFramework {
private final String title;
protected StupidInhouseFramework(String title) {
this.title = title;
draw();
}
public abstract void draw();
public String toString() {
return "StupidInhouseFramework " + title;
}
}
You might think that no one would ever call an abstract
method from a constructor, but you are wrong. I even found
places in the JDK where this is done, though I don't remember
where they were. Here is the class that the poor user has
constructed:
public class PoorUser extends StupidInhouseFramework {
private final Long density;
public PoorUser(String title, long density) {
super(title);
this.density = density;
}
public void draw() {
long density_fudge_value = density + 30 * 113;
System.out.println("draw ... " + density_fudge_value);
}
public static void main(String[] args) {
StupidInhouseFramework sif = new PoorUser("Poor Me", 33244L);
sif.draw();
}
}
When we run this, we get a NullPointerException. The field
is of type Long, the wrapper class. The method draw() is
called from the superclass, at which point the PoorUser's
constructor has not been called yet. It is thus still set to
null, which causes a NullPointerException when it is unboxed.
We can solve this problem using ThreadLocal and even though
this is not the typical use case, it is fun to look at.
public class HappyUser extends StupidInhouseFramework {
private final Long density;
private static final ThreadLocal<Long> density_param =
new ThreadLocal<Long>();
private static String setParams(String title, long density) {
density_param.set(density);
return title;
}
private long getDensity() {
Long param = density_param.get();
if (param != null) {
return param;
}
return density;
}
public HappyUser(String title, long density) {
super(setParams(title, density));
this.density = density;
density_param.remove();
}
public void draw() {
long density_fudge_value = getDensity() + 30 * 113;
System.out.println("draw ... " + density_fudge_value);
}
public static void main(String[] args) {
StupidInhouseFramework sif = new HappyUser("Poor Me", 33244L);
sif.draw();
}
}
Just a warning, the example inside the JavaDocs for
ThreadLocal in JDK 1.6 has an obvious typo in it, look
here for a
correction.
When Can ThreadLocal Values Be Garbage Collected?
We said already that they can be garbage collected when the
owning thread exits. However, if the thread belongs to a
thread pool, such as we would find in some application
servers, the values may or may not be garbage collected,
ever.
To demonstrate this, I created several classes with
finalize() methods, which demonstrate when the object has
reached the end of its life.
The first is a simple value that also shows when it is
collected and notifies our test framework. This will allow
us to write actual unit tests demonstrating our findings.
public class MyValue {
private final int value;
public MyValue(int value) {
this.value = value;
}
protected void finalize() throws Throwable {
System.out.println("MyValue.finalize " + value);
ThreadLocalTest.setMyValueFinalized();
super.finalize();
}
}
MyThreadLocal overrides ThreadLocal and prints out a message
when it is being finalized:
public class MyThreadLocal<T> extends ThreadLocal<T> {
protected void finalize() throws Throwable {
System.out.println("MyThreadLocal.finalize");
ThreadLocalTest.setMyThreadLocalFinalized();
super.finalize();
}
}
ThreadLocalUser is a class that would encapsulate a
ThreadLocal. When this is no longer reachable, we would
expect its ThreadLocal to also be collected. Note that in
the JavaDocs we read: ThreadLocal instances
are typically private static fields in classes
that wish to associate state with a thread (e.g., a user ID
or Transaction ID). By constructing lots of instance of
ThreadLocals, we are exhibiting the problem in a more
dramatic fashion.
public class ThreadLocalUser {
private final int num;
private MyThreadLocal<MyValue> value =
new MyThreadLocal<MyValue>();
public ThreadLocalUser() {
this(0);
}
public ThreadLocalUser(int num) {
this.num = num;
}
protected void finalize() throws Throwable {
System.out.println("ThreadLocalUser.finalize " + num);
ThreadLocalTest.setThreadLocalUserFinalized();
super.finalize();
}
public void setThreadLocal(MyValue myValue) {
value.set(myValue);
}
public void clear() {
value.remove();
}
}
The last class is MyThread, which shows when the thread is
collected:
public class MyThread extends Thread {
public MyThread(Runnable target) {
super(target);
}
protected void finalize() throws Throwable {
System.out.println("MyThread.finalize");
ThreadLocalTest.setMyThreadFinalized();
super.finalize();
}
}
The first two test cases illustrate what happens when the
thread local is cleared with the remove() method and when it
is left for the garbage collector to take care of. The
booleans are used to help us write the unit tests.
import junit.framework.TestCase;
import java.util.concurrent.*;
public class ThreadLocalTest extends TestCase {
private static boolean myValueFinalized;
private static boolean threadLocalUserFinalized;
private static boolean myThreadLocalFinalized;
private static boolean myThreadFinalized;
public void setUp() {
myValueFinalized = false;
threadLocalUserFinalized = false;
myThreadLocalFinalized = false;
myThreadFinalized = false;
}
public static void setMyValueFinalized() {
myValueFinalized = true;
}
public static void setThreadLocalUserFinalized() {
threadLocalUserFinalized = true;
}
public static void setMyThreadLocalFinalized() {
myThreadLocalFinalized = true;
}
public static void setMyThreadFinalized() {
myThreadFinalized = true;
}
private void collectGarbage() {
for (int i = 0; i < 10; i++) {
System.gc();
try {
Thread.sleep(50);
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
}
}
public void test1() {
ThreadLocalUser user = new ThreadLocalUser();
MyValue value = new MyValue(1);
user.setThreadLocal(value);
user.clear();
value = null;
collectGarbage();
assertTrue(myValueFinalized);
assertFalse(threadLocalUserFinalized);
assertFalse(myThreadLocalFinalized);
}
// weird case
public void test2() {
ThreadLocalUser user = new ThreadLocalUser();
MyValue value = new MyValue(1);
user.setThreadLocal(value);
value = null;
user = null;
collectGarbage();
assertFalse(myValueFinalized);
assertTrue(threadLocalUserFinalized);
assertTrue(myThreadLocalFinalized);
}
}
In test3(), we demonstrate how a thread closing releases its
ThreadLocal values:
public void test3() throws InterruptedException {
Thread t = new MyThread(new Runnable() {
public void run() {
ThreadLocalUser user = new ThreadLocalUser();
MyValue value = new MyValue(1);
user.setThreadLocal(value);
}
});
t.start();
t.join();
collectGarbage();
assertTrue(myValueFinalized);
assertTrue(threadLocalUserFinalized);
assertTrue(myThreadLocalFinalized);
assertFalse(myThreadFinalized);
}
In our next test, we see how a thread pool can cause the
value to be held captive:
public void test4() throws InterruptedException {
Executor singlePool = Executors.newSingleThreadExecutor();
singlePool.execute(new Runnable() {
public void run() {
ThreadLocalUser user = new ThreadLocalUser();
MyValue value = new MyValue(1);
user.setThreadLocal(value);
}
});
Thread.sleep(100);
collectGarbage();
assertFalse(myValueFinalized);
assertTrue(threadLocalUserFinalized);
assertTrue(myThreadLocalFinalized);
}
So far, we have not seen any great surprises. Now we get to
the interesting test cases. In the next one, we construct a
hundred ThreadLocals and then garbage collect them at the
end. Note that not a single one of the MyValue objects is
garbage collected:
public void test5() throws Exception {
for (int i = 0; i < 100; i++) {
ThreadLocalUser user = new ThreadLocalUser(i);
MyValue value = new MyValue(i);
user.setThreadLocal(value);
value = null;
user = null;
}
collectGarbage();
assertFalse(myValueFinalized);
assertTrue(threadLocalUserFinalized);
assertTrue(myThreadLocalFinalized);
}
In test6(), we see that due to the forced garbage collection,
some of the values are now being collected, but they lag
behind the ThreadLocalUser collections.
public void test6() throws Exception {
for (int i = 0; i < 100; i++) {
ThreadLocalUser user = new ThreadLocalUser(i);
MyValue value = new MyValue(i);
user.setThreadLocal(value);
value = null;
user = null;
collectGarbage();
}
assertTrue(myValueFinalized);
assertTrue(threadLocalUserFinalized);
assertTrue(myThreadLocalFinalized);
}
You can see how the collection of MyValues lags behind in the
output. By the time the program finished, MyValues 98 and 99
had not been collected.
ThreadLocalUser.finalize 96
MyValue.finalize 94
ThreadLocalUser.finalize 97
MyThreadLocal.finalize
MyValue.finalize 96
MyValue.finalize 95
MyThreadLocal.finalize
ThreadLocalUser.finalize 98
ThreadLocalUser.finalize 99
MyThreadLocal.finalize
MyValue.finalize 97
Gaze Inside ThreadLocal
One of the first things that I noticed when I looked inside
the ThreadLocal class was a big fat number 0x61c88647 staring
back at me. This was the HASH_INCREMENT. Every time a new
ThreadLocal is created, it gets a unique hash number by
adding 0x61c88647 to the previous value. I spent most of
yesterday trying to figure out why the engineers had
chosen this particular number. If you google for 61c88647,
you will find some articles in Chinese and a few to do with
encryption.
Besides that, not much else.
My friend John Green had the idea of turning this number into
decimal and repeating the search. The number 1640531527 had
more useful hits. However, in the contexts that we saw, it
was used to in hashing to multiply hash values, not add them.
Also, in all the contexts we found, the actual number was
-1640531527. Some more digging revealed that this number is
the 32-bit signed version of the unsigned number 2654435769.
This number represents the golden ratio (sqrt(5)-1) times
two to the power of 31. The result is then a golden number,
either 2654435769 or -1640531527. You can see the
calculation here:
public class ThreadHashTest {
public static void main(String[] args) {
long l1 = (long) ((1L << 31) * (Math.sqrt(5) - 1));
System.out.println("as 32 bit unsigned: " + l1);
int i1 = (int) l1;
System.out.println("as 32 bit signed: " + i1);
System.out.println("MAGIC = " + 0x61c88647);
}
}
For more information about the Golden Ratio, have a look at
the
Wikipedia link as well as a book
on data structures in C++. For completeness, I also
looked up references to the golden ratio in Donald Knuth's
The Art of Computer Programming .
Donald Knuth belongs on the desk of every serious computer
programmer, in the same way that the Merck Manual adorns your health
practitioner's bookshelf (Available
online). Just don't expect to understand the contents ...
We established thus that the HASH_INCREMENT has something to
do with fibonacci hashing, using the golden ratio. If we look
carefully at the way that hashing is done in the
ThreadLocalMap, we see why this is necessary. The standard
java.util.HashMap uses linked lists
to resolve clashes. The ThreadLocalMap simply looks for the
next available space and inserts the element there. It finds
the first space by bit masking, thus only the lower few bits
are significant. If the first space is full, it simply puts
the element in the next available space. The HASH_INCREMENT
spaces the keys out in the sparce hash table, so that the
possibility of finding a value next to ours is reduced.
When a ThreadLocal is garbage collected, the WeakReference
key in the ThreadLocalMap is cleared. The question we then
need to resolve is when this will be deleted from the
ThreadLocalMap. It does not get cleared out when we
call get() on another entry in the map.
java.util.WeakHashMap expunges all stale entries
from its map even on the get(). The get() would
thus be a bit faster in ThreadLocalMap, but might leave the
table with stale entries, hence memory leaks.
When a ThreadLocal is set(), it could fall into one of three
categories:
- Firstly, we could find the entry and simply set
it. In this case, stale entries are not expunged at all.
-
Secondly, we might find that one of the entries before ours
has become stale, in which case we expunge all stale
entries within our run (that is, between two
null values). If we find our key,
it will be swapped with the stale entry.
-
Thirdly, our run might not have enough space to expand into
in which case the entry is placed in the last null of the
run and some of the stale entries are cleaned out. This
stage uses a O(log2n) algorithm initially, but if it cannot
get below the fill factor, then a full rehash is performed
in O(n).
Lastly, if a key is removed, then the entry is expunged,
together with any other entries in that run.
If you want more information of how this works, you can get
some ideas from Knuth 6.4 Algorithm
R .
Best Practices
If you must use ThreadLocal, make sure that you remove the
value as soon as you are done with it and preferably before
you return your thread to a thread pool. Best practice is to
use remove() rather than
set(null), since that would cause the
WeakReference to be removed immediately, together with the
value.
Kind regards
Heinz
Performance Articles
Related Java Course
|