Running on Java 22-ea+27-2262 (Preview)
Home of The JavaSpecialists' Newsletter

031Hash, Hash, Away It Goes!

Author: Dr. Heinz M. KabutzDate: 2001-09-26Java Version: 1.2Category: Language
 

Abstract: When I first started using Java in 1997, I needed very large hash tables for matching records as quickly as possible. We ran into trouble when some of the keys were mutable and ended up disappearing from the table, and then reappearing again later.

 

Welcome to the 31st issue of The Java(tm) Specialists' Newsletter, where we look at how Hash Sets can become really confused. I love hearing from my readers, so please send your comments, praises, criticisms, accolades. The best compliment to me is if you forward this newsletter to other Java fans, for example to your local Java User Group (JUG).

I remember hearing last year that Visual Basic (shudder) was the fastest growing computer language. A report this year found that Java is now growing faster than Visual Basic, which frankly shudders me even more. There are apparently about 2 million people on this globe hacking away at our poor, dear language. Rather discouraging is that my newsletter is only reaching a meager 0.065% of Java programmers. Then again, this is supposed to be The Java Specialists' Newsletter, so maybe there aren't that many of us out there who should be considered specialists ;-)

javaspecialists.teachable.com: Please visit our new self-study course catalog to see how you can upskill your Java knowledge.

Hash, Hash, Away It Goes!

A few weeks ago, I received an email from Charles N. May who pointed out that it is quite easy to change an Integer using reflection in the same way that I showed in my article on Insane Strings. This is only possible because for some strange reason, or no reason at all, the data member value within the Integer class was not declared final.

Charles demonstrated the possible repercussions quite nicely by inserting the same Integer object several times into a java.util.HashSet, each time with different values. The contract of the java.util.Set says:

java.util.Set: A collection that contains no duplicate elements. More formally, sets contain no pair of elements e1 and e2 such that e1.equals(e2), and at most one null element.

It then goes on to say:

Note: Great care must be exercised if mutable objects are used as set elements. The behavior of a set is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is an element in the set.

Interesting, but how do we define mutable, or immutable for that matter? I searched various Java texts but could not find a definition of what is meant by that. Let's try and define it for our use:

Immutable Class: A class where the state cannot be changed.

Simple enough, but what does state refer to? Is it a state descriptor of all the data members in the class? Or is it just the identity of the class? For example, let's consider the class Employee, which is part of the middle class ...

public class Employee {
  private final String name;
  private double salary;
  public Employee(String name, double salary) {
    this.name = name;
    this.salary = salary;
  }
  public String getName() { return name; }
  public double getSalary() { return salary; }
  public void setSalary(double salary) { this.salary = salary; }
}

Is the Employee class mutable? If we say that the state consists of a conglomeration of all the data members then the answer is "Yes". If we say the state consists of the identity, the answer is "No", as we cannot change the state. However, it will get even more confusing as we also have to define identity. I would say that the identity is the part of the class that determines the value of e1.equals(e2) and e1.hashCode(). So, in Employee above, what is the identity? Is it name? Nope, it is actually the location of the object in memory. If we want the identity to be according to the name, we will have to write equals() and hashCode() methods, the details of which are below the scope of TJSN.

Since mutable is so hard to pin down, we should actually rewrite the javadocs for the java.util.Set interface:

Note: Great care must be exercised if objects with a mutable identity are used as set elements. The behaviour of a set is not specified if the value of an object is changed in a manner that affects equals comparisons or its hashcode while the object is an element in the set.

By the description above, we should be careful with objects like String, Integer, etc. where the identity can be changed at runtime using reflection. Alright, it is bad coding practice to change private data members using reflection, but Integer could easily have fitted the above description if the value had been final. Then again, as discussed in my "final" newsletter, it is also bad coding practice to not mark data members final when they should be. Incidentally, java.lang.String is a lost cause - we can never make that properly immutable since its identity is contained in an array and arrays are mutable. Once the 1'998'700 monkeys who don't read this newsletter discover you can change Strings, we are all lost!

So, what happens when you change the identity of an object while it is in the HashSet? Let's have a look:

import java.lang.reflect.Field; // I can't resist using that package
import java.util.*;

public class HashHashGone {
  // Did you know that you can make a static final data member
  // without a value (i.e. blank), as long as you set the value
  // in the static initializer?  I love making every possible
  // field final as it smokes out a lot of bugs.
  private static final Field intValue;
  static {
    try {
      intValue = Integer.class.getDeclaredField("value");
      intValue.setAccessible(true);
    } catch(NoSuchFieldException ex) {
      // we throw a RuntimeException from the static initializer
      // this will actually generate an Error and kill the thread
      // that first accessed GhostSet.
      throw new RuntimeException(
        "Serious error - no \"value\" field found in Integer");
    }
  }

  // This method changes the integer passed into the method to
  // the new value.
  private static void setInteger(Integer i, int newInt) {
    try {
      intValue.set(i, new Integer(newInt));
    } catch (IllegalAccessException ex) {
      throw new RuntimeException(
        "Serious error - field should have been accessible");
    }
  }

  // This method will be used later to print a detailed view of
  // the set, including the value, the class and the identity
  // hash code of the object.
  private static void printDetailed(Set set) {
    System.out.print("[");
    Iterator it = set.iterator();
    while(it.hasNext()) {
      Object val = it.next();
      System.out.print("\t(toString()=" + val + ",class="
        + val.getClass().getName() + ",identityHashCode="
        + System.identityHashCode(val) + ")");
      if (it.hasNext()) System.out.print(", ");
      System.out.println();
    }
    System.out.println("]");
  }

  public static void main(String[] args) {
    if (args.length != 2) {
      System.out.println(
        "arguments: <initial-intvalue> <num-copies-to-insert-in-Set>");
      return;
    }
    int initialValue = Integer.parseInt(args[0]);
    int numberOfCopiesToInsert = Integer.parseInt(args[1]);

    Integer x = new Integer(initialValue);
    int currValue = initialValue;

    Set set = new HashSet();
    for (int i = 0; i < numberOfCopiesToInsert; i++) {
      setInteger(x, ++currValue);
      set.add(x);
    }
    setInteger(x, initialValue);
    System.out.println("here's a set containing " +
      numberOfCopiesToInsert + " copies of Integer(" + x + "): ");
    System.out.println(set);
    System.out.println("detailed view of set:");
    printDetailed(set);
    System.out.println("and does it contain that Integer?: " +
      set.contains(x));

    System.out.println("can the Integer be removed from the Set?");
    System.out.println(set.remove(x));
    System.out.println("the Set contents after attempted remove:");
    System.out.println(set);

    setInteger(x, -initialValue);
    System.out.println(
      "altering the Integer to its opposite makes the Set contain:");
    System.out.println(set);

    setInteger(x, initialValue);
    currValue = initialValue;
    for (int i = 0; i < numberOfCopiesToInsert; i++) {
      setInteger(x, ++currValue);
      set.remove(x);
    }
    System.out.println("now all the elements have been removed "
      + "from the Set as it contains:");
    System.out.println(set);
    System.out.println();
  }
}

When I run this with java HashHashGone 42 5, I get the following output:

here's a set containing 5 copies of Integer(42):
[42, 42, 42, 42, 42]
detailed view of set:
[       (toString()=42,class=java.lang.Integer,identityHashCode=6483656),
        (toString()=42,class=java.lang.Integer,identityHashCode=6483656),
        (toString()=42,class=java.lang.Integer,identityHashCode=6483656),
        (toString()=42,class=java.lang.Integer,identityHashCode=6483656),
        (toString()=42,class=java.lang.Integer,identityHashCode=6483656)
]
and does it contain that Integer?: false
can the Integer be removed from the Set?
false
the Set contents after attempted remove:
[42, 42, 42, 42, 42]
altering the Integer to its opposite makes the Set contain:
[-42, -42, -42, -42, -42]
now all the elements have been removed from the Set as it contains:
[]

The HashSet actually contains a HashMap that keeps the entries as keys and values. The HashMap contains a rehash() method, which is called when the table grows past the threshold determined by the number of entries and fill factor. However, contrary to what I believed until I tried it out today, the rehash method does not consider the possibility that there may be two objects in different hash positions, but equal, in the table. Re-hashing will therefore not solve the problem described above.

Moral of the story, assuming there is still morality left in this world? (btw, have you ever considered how similar the words mortal, moral and mutable are?) OK, the moral is? Never create a class where the identity can be changed once the object has been created.

That's all for this week. Please take a moment to think of how I could improve this newsletter and pop me an email.

Regards

Heinz

 

Comments

We are always happy to receive comments from our readers. Feel free to send me a comment via email or discuss the newsletter in our JavaSpecialists Slack Channel (Get an invite here)

When you load these comments, you'll be connected to Disqus. Privacy Statement.

Related Articles

Browse the Newsletter Archive

About the Author

Heinz Kabutz Java Conference Speaker

Java Champion, author of the Javaspecialists Newsletter, conference speaking regular... About Heinz

Superpack '23

Superpack '23 Our entire Java Specialists Training in one huge bundle more...

Free Java Book

Dynamic Proxies in Java Book
Java Training

We deliver relevant courses, by top Java developers to produce more resourceful and efficient programmers within their organisations.

Java Consulting

We can help make your Java application run faster and trouble-shoot concurrency and performance bugs...