Java Specialists' Java Training Europehome of the java specialists' newsletter

The Java Specialists' Newsletter
Issue 1932011-06-30 Category: Performance Java version: Java 6

GitHub Subscribe Free RSS Feed

Memory Usage of Maps

by Dr. Heinz M. Kabutz
Abstract:
In this newsletter we measure the memory requirements of various types of hash maps available in Java. Maps usually need to be threadsafe, but non-blocking is not always the most important requirement.

Welcome to the 193rd issue of The Java(tm) Specialists' Newsletter, sent from the amazing Island of Crete. A few weeks ago my Greek teacher introduced me to "aoristos", the simple past. This is great, because now I can bore my Greek friends with wild tales of life in Africa when I was a child. "Ipia me tous elefantes" - I drank with the elephants. The beauty of being from Africa is that Europeans will believe anything you say. "Ahh, so you had a lion as a pet? I knew it!" If you know a bit of Greek, try this aoristos flash card test. My name is on top at the moment, but I'm sure I will easily be dethroned :-)

In my previous newsletter, I challenged you to explain why the anonymous class sets the this$0 field before calling super(). Kai Windmoeller was the first to send me a partial reason and Wouter Coekaerts was the first with a perfect explanation. Both Kai and Wouter subsequently sent me other clever ideas that I would like to incorporate in future newsletters. [And the explanation is ....... you'll have to figure that out yourself :-) A hint though, if you compile the class with -source 1.3 and -target 1.3, it does not do that. See what issues that can cause and you will see why we need this.]

Memory Usage of Maps

My newsletter is strongly connected to my courses. When I research Java for my newsletter, ideas emerge on how to improve my advanced Java courses. Questions asked during the courses often stimulate ideas for new research topics. It is a delicate ecosystem. They cannot exist without each other.

A few months ago, during one of my masters courses, one of the programmers mentioned that they had noticed a memory bottleneck with the ConcurrentHashMap. They were creating about 100000 maps and wanted them to be threadsafe. The natural choice seemed to be the ConcurrentHashMap, since it, well, is supposed to work with concurrent access.

The ConcurrentHashMap splits the bucket table into a number of segments, thus reducing the probability that you would have contention when modifying the map. It is quite a clever design and scales nicely to about 50 cores. Above 50 cores, you would be better off using Cliff Click's Highly Scalable Libraries. Since my friends did not need high scalability, the ConcurrentHashMap seemed fine.

Whilst doing a memory profile, JVisualVM showed that the top culprit was the ConcurrentHashMap.Segment class. The default number of segments per ConcurrentHashMap is 16. The HashEntry tables within the segments would probably be tiny, but each Segment is a ReentrantLock. Each ReentrantLock contains a Sync, in this case a NonFairSync, which is a subclass of Sync and then AbstractQueuedSynchronizer. Each of these contains a queue of nodes that maintain state of what is happening with your threads. It is used when fairness is determined. This queue and the nodes use a lot of memory.

Many years ago, I wrote a newsletter that demonstrated a simple memory test bench. It would construct an object, then release it again with System.gc() and measure the difference. Here is a slightly updated version of the MemoryTestBench. It still does virtually the same, with the only enhancement that I sleep a bit after each System.gc() call:

public class MemoryTestBench {
  public long calculateMemoryUsage(ObjectFactory factory) {
    Object handle = factory.makeObject();
    long memory = usedMemory();
    handle = null;
    lotsOfGC();
    memory = usedMemory();
    handle = factory.makeObject();
    lotsOfGC();
    return usedMemory() - memory;
  }

  private long usedMemory() {
    return Runtime.getRuntime().totalMemory() -
        Runtime.getRuntime().freeMemory();
  }

  private void lotsOfGC() {
    for (int i = 0; i < 20; i++) {
      System.gc();
      try {
        Thread.sleep(100);
      } catch (InterruptedException e) {
        Thread.currentThread().interrupt();
      }
    }
  }

  public void showMemoryUsage(ObjectFactory factory) {
    long mem = calculateMemoryUsage(factory);
    System.out.println(
        factory.getClass().getSimpleName() + " produced " +
            factory.makeObject().getClass().getSimpleName() +
            " which took " + mem + " bytes");
  }
}
  

I created an ObjectFactory interface and some basic classes to test that it still worked correctly:

public interface ObjectFactory {
  Object makeObject();
}

public class BasicObjectFactory implements ObjectFactory {
  public Object makeObject() {
    return new Object();
  }
}

public class IntegerObjectFactory implements ObjectFactory {
  public Object makeObject() {
    return new Integer(333);
  }
}

public class LongObjectFactory implements ObjectFactory {
  public Object makeObject() {
    return new Long(333L);
  }
}
  

I tried this out with Java 1.6.0_24 on my Mac OS X 10.6.8 and with a self-built Java 1.7.0 based on the OpenJDK. I also tried 32 vs. 64 bit, server vs. client on the 32 bit and the flag -XX:+UseCompressedOops on the Server Hotspot Compiler. The -server and -client made the least difference, so I only include the -server results. Also, the results for Java 1.7.0 were similar enough to 1.6.0_24 that I will only show the 1.6 data:

Java Version 1.6.0_24 with -server
32/64bit         64   64   32
CompressedOops   No  Yes  N/A
Object           24   24    8
Long             24   24   16
Integer          24   24   16
  

It looks as if the -XX:+UseCompressedOops flag has no effect on these objects. You will only see the difference with more complex objects that have pointers to others. This flag can also speed up your application substantially if you are using a 64 bit machine.

Here are some factories for creating various hash maps. The first is not threadsafe, the other two are:

public class HashMapFactory implements ObjectFactory {
  public Object makeObject() {
    return new HashMap();
  }
}

public class SynchronizedHashMapFactory implements ObjectFactory {
  public Object makeObject() {
    return Collections.synchronizedMap(new HashMap());
  }
}

public class HashtableFactory implements ObjectFactory {
  public Object makeObject() {
    return new Hashtable();
  }
}
  

We see that even basic hash tables differ greatly in size between various implementations. If memory space is a major issue, like it was for my friends, then the Java 1.0 Hashtable class might work best. Hashtable is fully synchronized, which means that it will cause contention when accessed from more than one core at a time. It also uses integer division to locate the correct bucket, which is slower than the bit masking approach used since Java 1.4. However, if memory is your bottleneck, then Hashtable might be a good solution. Here are the memory sizes:

32/64bit          64   64   32
CompressedOops    No   Yes  N/A
HashMap           216  128  120
SynchronizedMap   272  160  152
Hashtable         176  112   96
  

The ConcurrentHashMap allows us to construct it with a concurrency level, which is used to calculate the number of segments that the map will contain. The actual number of segments is a power of 2 greater or equal to the concurrency level. Thus if we construct a map with concurrency level of 200, it will create 256 segments. As mentioned above, every segment is subclassed from ReentrantLock. Thus we will show the sizes for ReentrantLock, and ConcurrentHashMaps with sizes 16 (the default), 2 and 256:

public class ReentrantLockFactory implements ObjectFactory {
  public Object makeObject() {
    return new ReentrantLock();
  }
}

public class ConcurrentHashMapFactory implements ObjectFactory {
  public Object makeObject() {
    return new ConcurrentHashMap();
  }
}

public class SmallConcurrentHashMapFactory implements ObjectFactory {
  public Object makeObject() {
    return new ConcurrentHashMap(16, .75f, 2);
  }
}

public class BigConcurrentHashMapFactory implements ObjectFactory {
  public Object makeObject() {
    return new ConcurrentHashMap(16, .75f, 256);
  }
}
  

Based on these results, we can see why the ConcurrentHashMap uses so much memory, even when it is empty.

32/64bit                    64      64      32
CompressedOops              No     Yes     N/A
ReentrantLock               72      56      40
ConcurrentHashMap         2272    1664    1272
SmallConcurrentHashMap     480     312     272
BigConcurrentHashMap     34912   25664   19512
  

Another option is Cliff Click's Highly Scalable Libraries.

import org.cliffc.high_scale_lib.*;

public class HighlyScalableTableFactory implements ObjectFactory {
  public Object makeObject() {
    return new NonBlockingHashMap();
  }
}
  

Click's empty NonBlockingHashMap uses a lot less memory than the ConcurrentHashMap.

32/64bit                  64     64      32
CompressedOops            No    Yes     N/A
ConcurrentHashMap       2272   1664    1272
NonBlockingHashMap	1312	936	872
  

Let's see what happens when we put some elements into the map, by writing an ObjectFactory that fills the map with objects. By adding autoboxed Integers from the integer cache and constant Strings, we can measure the hash map overhead, instead of the objects contained inside the map.

public class FullMapObjectFactory implements ObjectFactory {
  private final ObjectFactory factory;

  protected FullMapObjectFactory(ObjectFactory factory) {
    this.factory = factory;
  }

  public Object makeObject() {
    return fill((Map) factory.makeObject());
  }

  protected Map fill(Map map) {
    for (int i = -128; i < 128; i++) {
      map.put(i, "dummy");
    }
    return map;
  }
}
  

With this particular data set, the NonBlockingHashMap uses the least amount of memory, but I have seen other data sets where the Hashtable uses the least. You would have to try it out in your particular situation to find the best possible map for your needs. Also remember that the NonBlockingHashMap scales to hundreds of cores, whereas the Hashtable would have contention with two.

32/64bit                    64      64      32
CompressedOops              No     Yes     N/A
ConcurrentHashMap        30024   21088   16872
SmallConcurrentHashMap   16736   10488    8400
BigConcurrentHashMap     48144   34040   26416
Hashtable                15440    9792    7728
NonBlockingHashMap       10912    6696    6632
  

As in all performance issues, you need to measure and not guess. Please only change your code if you have measured and verified that this is indeed your bottleneck.

Kind regards from sunny Crete

Heinz

P.S. You might have seen disturbing images of Greek rioting. Fortunately this is far away from where we live on the Island of Crete. The despair is quite real though.

Performance Articles Related Java Course

Java Master
Java Concurrency
Design Patterns
In-House Courses



© 2010-2014 Heinz Kabutz - All Rights Reserved Sitemap
Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. JavaSpecialists.eu is not connected to Oracle, Inc. and is not sponsored by Oracle, Inc.
@CORE_THE_BAND #RBBJGR