Java Specialists' Java Training Europehome of the java specialists' newsletter

The Java Specialists' Newsletter
Issue 2222014-09-09 Category: Language Java version: Java 6,7,8

GitHub Subscribe Free RSS Feed

Identity Crisis

by Dr. Heinz M. Kabutz
Abstract:
The JavaDocs for method Object.hashCode() seems to suggest that the value is somehow related to memory location. However, in this newsletter we discover that there are several different algorithms and that Java 8 has new defaults to give better threading result if we call hashCode a lot.

Welcome to the 222nd issue of The Java(tm) Specialists' Newsletter. A few years ago, I bought myself a little birthday present, which I named "Java" (or TZABA in Greek letters). Only fair, considering that it was with income from JAVA that I bought her. Since she has 9x more HP than can be driven without a skipper's license, I verified that my South African permit was valid in Greece. "Yes", the port police said on two occasions. But this year, they changed the fellows working there and their answer became "Yes, but ...". The "but" was rather complicated - I would have to have it translated by the South African embassy and get a letter from them proving its authenticity. Only catch is - our embassy does not do that. Eventually, the only option for me was to write the Greek skipper's license - in GREEK! So how can someone who hardly speaks the language pass a written exam, you might ask? I seriously considered selling TZABA, but her namesake helped me out.

Enter Java to save TZABA. In order to pass my exam, my instructor gave me a list of 104 sample questions - in Greek. I ran them through Google Translate in order to get a very basic idea of what they meant. I then wrote a Java program to quiz me on sections of the questions. The program would present a question to me, with the three possible answers and an option to show my English translation. Initially I relied a lot on the translation, but eventually I was able to answer all 104 questions correctly in Greek. (In fact, I just tried again, 5 weeks after writing the exam, and I could still get 100% right.) In some cases it meant recognizing certain patterns. For example, in most of the questions, the longest answer was the correct one! Greek is rather verbose, so to state something clearly takes a lot of long words, which might explain why they speak so fast here. In some cases, I only really recognized one out of 42 words, for example: "orthi", meaning, at right angles (this is how vessels are supposed to cross a traffic corridor). After preparing myself, I also spent two weeks in the evenings sitting in a skipper's class (in Greek) with the best teacher I've ever had, Pavlos Fourakis, who managed to explain everything despite my poor language skills.

The exam was fun. I had done what I could to prepare myself. I knew the subject very well, having already done the license in South Africa. During the exam, I asked the examiner what a simple Greek word meant. From my grasp of the Greek language, she realized that I would definitely fail. Fortunately, about 70% of the questions were directly from the sample questions and the rest were similar enough that I was able to guess the correct answer. The examiners were astonished when I scored 100% correct in the written exam (as was I) :-) So you see, my dear readers in 135 countries, Java does have its practical uses for day-to-day activities.

NEW: Please see our new "Extreme Java" course, combining concurrency, a little bit of performance and Java 8. Extreme Java - Concurrency & Performance for Java 8.

Identity Crisis

You have probably at some point of your coding career made the classic mistake of forgetting to implement the hashCode() method when writing equals(), with the result that you can insert your object into a HashMap, but finding it again can only be done by iterating over it. Something like this:

public final class JavaChampion {
  private final String name;

  public JavaChampion(String name) {
    this.name = name;
  }

  public boolean equals(Object o) { // simplified equals()
    if (!(o instanceof JavaChampion)) return false;
    return name.equals(((JavaChampion) o).name);
  }
}
  

And here is a little test with three Java Champions in its map:

import java.util.*;

public class JavaChampionTest {
  public static void main(String... args) {
    Map<JavaChampion, String> urls = new HashMap<>();
    urls.put(new JavaChampion("Jack"), "fasterj.com");
    urls.put(new JavaChampion("Kirk"), "kodewerk.com");
    urls.put(new JavaChampion("Heinz"), "javaspecialists.eu");

    urls.forEach((p,u) -> System.out.println(u)); // Java 8

    System.out.println("URL for Kirk: " +
      urls.get(new JavaChampion("Kirk"))); // url == null
  }
}
  

Most Java programmers can probably guess what the answer is going to be:

heinz$ java JavaChampionTest
javaspecialists.eu
fasterj.com
kodewerk.com
URL for Kirk: null
  

But is it guaranteed to always give this answer? Let's run this again, but with the -XX:hashCode=2 JVM parameter:

heinz$ java -XX:hashCode=2 JavaChampionTest
javaspecialists.eu
fasterj.com
kodewerk.com
URL for Kirk: kodewerk.com
  

Oh wow, did you see that coming? What is -XX:hashCode? (BTW, I first asked the question on Twitter - if you are interested in hearing the latest Java gossip, follow me on heinzkabutz.)

Since we did not explicitly write a hashCode() method in our class, it defaulted to Object.hashCode(). In the JavaDocs it states that "As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the Java(tm) programming language.)"

Thanks to the comment in the JavaDocs, I always just assumed that the identity hash code, also obtained with System.identityHashCode(object), had something to do with the memory location of the object or at least to be reasonably unique. However, since on a modern machine we have a 64-bit address space, the possibility has to exist that with a 32-bit identity hash code we sometimes have clashes.

This identity hash code is used in Java Concurrency in Practice to avoid a lock-ordering deadlock. However, in the book Goetz makes provision for the possibility that two distinct objects can have the same identity hash code. I spent several years puzzling with the question: how do you test that? After all, identity hash codes did look quite unique most of the time. For example:

public class UniqueNumbers {
  public static void main(String... args) {
    for (int i = 0; i < 10; i++) {
      System.out.printf("%,d%n", new Object().hashCode());
    }
  }
}
  

Please run it a few times on Java 7 and then a few times on Java 8. Spot any difference?

Here's my output from Java 7:

2,102,755,048
1,766,475,321
189,300,272
1,146,390,297
158,440,795
34,719,285
1,558,954,658
2,050,443,606
1,135,602,633
1,386,281,942
  

And now from Java 8:

1,950,409,828
1,995,265,320
746,292,446
1,072,591,677
1,523,554,304
1,175,962,212
918,221,580
2,055,281,021
1,554,547,125
617,901,222
  

And just for good measure, Java 7, but in 32-bit:

17,547,117
23,818,829
18,693,832
17,102,985
22,615,283
24,024,212
14,455,742
23,584,771
10,749,831
15,359,367
  

Do you notice how much smaller the 32-bit numbers are? And did you see that the identity hash code for 64-bit Java was never negative? And how some numbers always seem to be there, no matter how often you run the class?

First a few words about 32-bit vs 64-bit and then we will examine Java 7 vs Java 8 a bit further.

If you look at the object header in the markOop.hpp file, you will notice that for a 32-bit JVM, we have just 25 bits to store the identity hash code. Thus the highest number we could possibly have is 33,554,431 on a 32-bit JVM. For a 64-bit JVM, we have 31 bits of storage space, which explains why the identity hash codes are always positive.

Next we will examine differences between Java 7 and Java 8. The numbers both look like they have roughly the same maximum of 31 bits, but they are just different. We would expect that in a way. But how are they calculated? Are they memory locations as suggested in the Object.hashCode() method? When we run Java 7 and 8 with the -XX:+PrintFlagsFinal flag and grep for "hashCode" we see that Java 7 has hashCode flag of 0 and Java 8 has value 5.

To see the various options that we can use, please have a look at the get_next_hash() method in vm/runtime/synchronizer.cpp:

      HashCode==0: Simply returns random numbers with no relation to where in memory the object is found. As far as I can make out, the global read-write of the seed is not optimal for systems with lots of processors.
      HashCode==1: Counts up the hash code values, not sure at what value they start, but it seems quite high.
      HashCode==2: Always returns the exact same identity hash code of 1. This can be used to test code that relies on object identity. The reason why JavaChampionTest returned Kirk's URL in the example above is that all objects were returning the same hash code.
      HashCode==3: Counts up the hash code values, starting from zero. It does not look to be thread safe, so multiple threads could generate objects with the same hash code.
      HashCode==4: This seems to have some relation to the memory location at which the object was created.
      HashCode>=5: This is the default algorithm for Java 8 and has a per-thread seed. It uses Marsaglia's xor-shift scheme to produce pseudo-random numbers.

Even though -XX:hashCode=5 seems to scale better on machines with lots of processors, it does suffer more acutely from "twins", which is, two distinct objects with the same identity hash code. Let's try find twins with my class FindTwins, where we create one object and then keep on creating new objects until we find its twin:

public class FindTwins {
  public static void main(String... args) {
    Object obj = new Object();
    Object twin = findTwin(obj);
    System.out.printf("found twin: %s and %s, but == is %b%n",
        obj, twin, obj == twin);
  }

  private static Object findTwin(Object obj) {
    int hash = obj.hashCode();
    Object twin;
    long created = 0;
    do {
      twin = new Object();
      if ((++created & 0xfffffff) == 0) {
        System.out.printf("%,d created%n", created);
      }
    } while (twin.hashCode() != obj.hashCode());
    System.out.printf("We had to create %,d objects%n", created);
    return twin;
  }
}
  

Here you see the output from running it in Java 7:

268,435,456
536,870,912
805,306,368
1,073,741,824
1,342,177,280
1,610,612,736
1,879,048,192
We had to create 2,147,479,519 objects
found twin: Object@45486b51 and Object@45486b51, but == is false
  

As you can see, we had to create almost 2^31 objects to find a twin. Now let's look at the Java 8 output, which uses the new identity hash code algorithm:

268,435,456
536,870,91
805,306,36
1,073,741,824
1,342,177,280
We had to create 1,608,293,922 objects
found twin: Object@5a07e868 and Object@5a07e868, but == is false
  

There were a lot less objects needed to get back to our object with Marsaglia's xor-shift algorithm.

Let's look at a different way of discovering twins, using the principles of the Birthday Paradox. Instead of looking for the twin of our first object, we look for a twin of any object. As we create objects, we put them into a HashMap with the key being the identity hash code and the value the object. If the key was already in the map, we know that we have found a twin. This should hopefully find a twin faster, if we have enough memory to hold the existing objects:

import java.util.concurrent.*;

public class FindAnyTwin {
  public static void main(String... args) {
    ConcurrentMap<Integer, Object> all =
        new ConcurrentHashMap<>();
    int created = 0;
    Object obj, twin;
    do {
      obj = new Object();
      twin = all.putIfAbsent(obj.hashCode(), obj);
      if ((++created & 0xffffff) == 0) {
        System.out.printf("%,d created%n", created);
      }
    } while (twin == null);
    System.out.println("found twin: " + obj +
        " and " + twin + " but == is " + (obj == twin));
    System.out.println("Size of map is " + all.size());
  }
}
  

I ran this for a while with Java 7, after allocating 14 GB of memory, but could not find a twin:

16,777,216
33,554,432
50,331,648
67,108,864
83,886,080
100,663,296
117,440,512
134,217,728
Exception in thread "main" OutOfMemoryError: Java heap space
  

However, in Java 8 with the new identity hash code algorithm, I found a twin almost immediately:

found twin: Object@7f385cbe and Object@7f385cbe but == is false
Size of map is 105786
  

I asked on my twitter feed whether anybody knew how to make the JavaChampionTest produce a non-null URL for Kirk, but besides some nasty code by Peter Lawrey involving overwriting the object header, no one produced the answer I was looking for.

One last little gotcha to do with the identity hash code. It is assigned lazily. Thus if you call hashCode() on Object, it is calculated and written into the object header. Unfortunately space is at a premium in the object header and there is not enough space for the biased locking bits and the object identity. Thus as far as I can tell, objects that have had their identity hash code set, also automatically lose their ability to be part of biased locking. If they already have locks biased towards a thread, then calling System.identityHashCode(obj) will revoke the bias. You can read more about biased locking in Dave Dice's blog.

Kind regards from Hinxton, home of the stunning European Bioinformatics Institute.

Heinz

Language Articles Related Java Course

Extreme Java - Concurrency and Performance for Java 8
Extreme Java - Advanced Topics for Java 8
Design Patterns
In-House Courses

© 2010-2016 Heinz Kabutz - All Rights Reserved Sitemap
Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. JavaSpecialists.eu is not connected to Oracle, Inc. and is not sponsored by Oracle, Inc.