|
The Java Specialists' Newsletter
Issue 193 2011-06-30
Category:
Performance
Java version: Java 6 Memory Usage of Mapsby Dr. Heinz M. KabutzAbstract: In this newsletter we measure the memory requirements of various types of hash maps available in Java. Maps usually need to be threadsafe, but non-blocking is not always the most important requirement.
Welcome to the 193rd issue of The Java(tm) Specialists' Newsletter, sent from the amazing
Island of Crete. A few weeks ago my Greek teacher introduced
me to "aoristos", the simple past. This is great, because
now I can bore my Greek friends with wild tales of life in
Africa when I was a child. "Ipia me tous elefantes" -
I drank with the elephants. The beauty of being from Africa
is that Europeans will believe anything you say. "Ahh, so
you had a lion as a pet? I knew it!" If you know a bit
of Greek, try
this aoristos flash card test. My name is on top at the
moment, but I'm sure I will easily be dethroned :-)
In my previous
newsletter, I challenged you to explain why
the anonymous class sets the this$0 field
before calling super(). Kai Windmoeller
was the first to send me a partial reason and Wouter
Coekaerts was the first with a perfect explanation. Both Kai
and Wouter subsequently sent me other clever ideas that I
would like to incorporate in future newsletters. [And the
explanation is ....... you'll have to figure that out
yourself :-) A hint though, if you compile the class with
-source 1.3 and -target 1.3, it does not do that. See what
issues that can cause and you will see why we need this.]
Our New Java Design Patterns Self-Study Course is Now Available
Memory Usage of Maps
My newsletter is strongly connected to my courses. When I
research Java for my newsletter, ideas emerge on how to
improve my advanced Java
courses. Questions asked during the courses
often stimulate ideas for new research topics. It is a
delicate ecosystem. They cannot exist without each other.
A few months ago, during one of my masters courses, one of
the programmers mentioned that they had noticed a memory
bottleneck with the ConcurrentHashMap. They were creating
about 100000 maps and wanted them to be threadsafe. The
natural choice seemed to be the ConcurrentHashMap, since it,
well, is supposed to work with concurrent access.
The ConcurrentHashMap splits the bucket table into a number
of segments, thus reducing the probability that you would
have contention when modifying the map. It is quite a clever
design and scales nicely to about 50 cores. Above 50 cores,
you would be better off using Cliff
Click's Highly Scalable Libraries. Since my friends
did not need high scalability, the ConcurrentHashMap seemed
fine.
Whilst doing a memory profile, JVisualVM showed that the top
culprit was the ConcurrentHashMap.Segment class. The default
number of segments per ConcurrentHashMap is 16. The
HashEntry tables within the segments would probably be tiny,
but each Segment is a ReentrantLock. Each ReentrantLock
contains a Sync, in this case a NonFairSync, which is a
subclass of Sync and then AbstractQueuedSynchronizer. Each
of these contains a queue of nodes that maintain state of
what is happening with your threads. It is used when
fairness is determined. This queue and the nodes use a lot
of memory.
Many years ago, I wrote a newsletter that demonstrated a
simple memory test
bench. It would construct an object, then
release it again with System.gc() and measure the difference.
Here is a slightly updated version of the MemoryTestBench.
It still does virtually the same, with the only enhancement
that I sleep a bit after each System.gc() call:
public class MemoryTestBench {
public long calculateMemoryUsage(ObjectFactory factory) {
Object handle = factory.makeObject();
long memory = usedMemory();
handle = null;
lotsOfGC();
memory = usedMemory();
handle = factory.makeObject();
lotsOfGC();
return usedMemory() - memory;
}
private long usedMemory() {
return Runtime.getRuntime().totalMemory() -
Runtime.getRuntime().freeMemory();
}
private void lotsOfGC() {
for (int i = 0; i < 20; i++) {
System.gc();
try {
Thread.sleep(100);
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
}
}
public void showMemoryUsage(ObjectFactory factory) {
long mem = calculateMemoryUsage(factory);
System.out.println(
factory.getClass().getSimpleName() + " produced " +
factory.makeObject().getClass().getSimpleName() +
" which took " + mem + " bytes");
}
}
I created an ObjectFactory interface and some basic classes
to test that it still worked correctly:
public interface ObjectFactory {
Object makeObject();
}
public class BasicObjectFactory implements ObjectFactory {
public Object makeObject() {
return new Object();
}
}
public class IntegerObjectFactory implements ObjectFactory {
public Object makeObject() {
return new Integer(333);
}
}
public class LongObjectFactory implements ObjectFactory {
public Object makeObject() {
return new Long(333L);
}
}
I tried this out with Java 1.6.0_24 on my Mac OS X 10.6.8
and with a self-built Java 1.7.0 based on the OpenJDK.
I also tried 32 vs. 64 bit, server vs. client on the 32 bit
and the flag -XX:+UseCompressedOops on the Server Hotspot
Compiler. The -server and -client made the least difference,
so I only include the -server results. Also, the results
for Java 1.7.0 were similar enough to 1.6.0_24 that I will
only show the 1.6 data:
Java Version 1.6.0_24 with -server
32/64bit 64 64 32
CompressedOops No Yes N/A
Object 24 24 8
Long 24 24 16
Integer 24 24 16
It looks as if the -XX:+UseCompressedOops flag has no effect
on these objects. You will only see the difference with
more complex objects that have pointers to others. This flag
can also speed up your application substantially if you are
using a 64 bit machine.
Here are some factories for creating various hash maps. The
first is not threadsafe, the other two are:
public class HashMapFactory implements ObjectFactory {
public Object makeObject() {
return new HashMap();
}
}
public class SynchronizedHashMapFactory implements ObjectFactory {
public Object makeObject() {
return Collections.synchronizedMap(new HashMap());
}
}
public class HashtableFactory implements ObjectFactory {
public Object makeObject() {
return new Hashtable();
}
}
We see that even basic hash tables differ greatly in size
between various implementations. If memory space is
a major issue, like it was for my friends, then the Java 1.0
Hashtable class might work best. Hashtable is fully
synchronized, which means that it will cause contention when
accessed from more than one core at a time. It also uses
integer division to locate the correct bucket, which is
slower than the bit masking approach used since Java 1.4.
However, if memory is your bottleneck, then Hashtable might
be a good solution. Here are the memory sizes:
32/64bit 64 64 32
CompressedOops No Yes N/A
HashMap 216 128 120
SynchronizedMap 272 160 152
Hashtable 176 112 96
The ConcurrentHashMap allows us to construct it with a
concurrency level, which is used to calculate the number
of segments that the map will contain. The actual number of
segments is a power of 2 greater or equal to the concurrency
level. Thus if we construct a map with concurrency level of
200, it will create 256 segments. As mentioned above, every
segment is subclassed from ReentrantLock. Thus we will show
the sizes for ReentrantLock, and ConcurrentHashMaps with
sizes 16 (the default), 2 and 256:
public class ReentrantLockFactory implements ObjectFactory {
public Object makeObject() {
return new ReentrantLock();
}
}
public class ConcurrentHashMapFactory implements ObjectFactory {
public Object makeObject() {
return new ConcurrentHashMap();
}
}
public class SmallConcurrentHashMapFactory implements ObjectFactory {
public Object makeObject() {
return new ConcurrentHashMap(16, .75f, 2);
}
}
public class BigConcurrentHashMapFactory implements ObjectFactory {
public Object makeObject() {
return new ConcurrentHashMap(16, .75f, 256);
}
}
Based on these results, we can see why the ConcurrentHashMap
uses so much memory, even when it is empty.
32/64bit 64 64 32
CompressedOops No Yes N/A
ReentrantLock 72 56 40
ConcurrentHashMap 2272 1664 1272
SmallConcurrentHashMap 480 312 272
BigConcurrentHashMap 34912 25664 19512
Another option is Cliff
Click's Highly Scalable Libraries.
import org.cliffc.high_scale_lib.*;
public class HighlyScalableTable implements ObjectFactory {
public Object makeObject() {
return new NonBlockingHashMap();
}
}
Click's empty NonBlockingHashMap uses a lot less memory than
the ConcurrentHashMap.
32/64bit 64 64 32
CompressedOops No Yes N/A
ConcurrentHashMap 2272 1664 1272
NonBlockingHashMap 1312 936 872
Let's see what happens when we put some elements into the
map, by writing an ObjectFactory that fills the map with
objects. By adding autoboxed Integers from the integer cache
and constant Strings, we can measure the hash map overhead,
instead of the objects contained inside the map.
public class FullMapObjectFactory implements ObjectFactory {
private final ObjectFactory factory;
protected FullMapObjectFactory(ObjectFactory factory) {
this.factory = factory;
}
public Object makeObject() {
return fill((Map) factory.makeObject());
}
protected Map fill(Map map) {
for (int i = -128; i < 128; i++) {
map.put(i, "dummy");
}
return map;
}
}
With this particular data set, the NonBlockingHashMap uses
the least amount of memory, but I have seen other data sets
where the Hashtable uses the least. You would have to try it
out in your particular situation to find the best possible
map for your needs. Also remember that the
NonBlockingHashMap scales to hundreds of cores, whereas the
Hashtable would have contention with two.
32/64bit 64 64 32
CompressedOops No Yes N/A
ConcurrentHashMap 30024 21088 16872
SmallConcurrentHashMap 16736 10488 8400
BigConcurrentHashMap 48144 34040 26416
Hashtable 15440 9792 7728
NonBlockingHashMap 10912 6696 6632
As in all performance issues, you need to measure and not
guess. Please only change your code if you have measured and
verified that this is indeed your bottleneck.
Kind regards from sunny Crete
Heinz
P.S. You might have seen disturbing images of Greek rioting.
Fortunately this is far away from where we live on the Island
of Crete. The despair is quite real though.
Performance Articles
Related Java Course
Discuss at The Java Specialist Club
|