Abstract: How much memory was wasted when an additional boolean field was added to java.lang.String in Java 13? None at all. This article explains why.
Welcome to the 278th edition of The Java(tm) Specialists' Newsletter, sent to you from the stunning Island of Crete. During the lockdown period, we are fortunately still allowed to go out for exercise. Thus my daily runs are continuing. I regularly share the lovely views on @heinzkabutz.
My book "Dynamic Proxies in Java" has now been published and you can get your free copy of the e-book from InfoQ.
 
   
javaspecialists.teachable.com: Please visit our new self-study course catalog to see how you can upskill your Java knowledge.
Last month, in newsletter 277, I wrote about a change in Java 13 that prevented having to recalculate the hash code of a String in the unlikely case that it was 0. I saw several objections to the change, asking why Oracle had added another field to String, thus increasing its memory consumption.
Object size in Java is somewhat hard to determine. We do not have a sizeof operator. It also varies by system. For example, in a 64-bit JVM with compressed OOPS, we use 4 bytes for a reference and 12 bytes for the object header. If our JVM is configured with a maximum heap of 32 GB or more, then a reference is 8 bytes and the object header is 16 bytes.
   One thing that is consistent with all JVM systems I have
   looked at, is that objects are aligned on 8 byte boundaries.
   This means that the actual memory usage of an object will
   always be a multiple of 8.
   Thus the java.lang.Boolean class is 12 bytes for
   the object header and one byte for the boolean, totalling
   13 bytes. However, it will use 16 bytes, wasting 3 bytes due
   to object alignment.
   
   In the past, I used all sorts of trickery for guessing the
   object size. Nowadays I use JOL
      (Java Object Layout). For example, here is the output
   when we look at the internals of
   java.lang.Boolean:
   
java.lang.Boolean object internals:
 OFFSET  SIZE    TYPE DESCRIPTION
      0     4         (object header)
      4     4         (object header)
      8     4         (object header)
     12     1 boolean Boolean.value
     13     3         (loss due to the next object alignment)
Instance size: 16 bytes
Space losses: 0 bytes internal + 3 bytes external = 3 bytes total
As we see, the instance size is 16 bytes and we have three bytes that are unused space.
If we create a JVM with a 32GB heap (-Xmx32g), then the object header uses 16 bytes and thus the size is 17 bytes. However, the actual size is 24 bytes, due to object alignment:
java.lang.Boolean object internals:
 OFFSET  SIZE    TYPE DESCRIPTION
      0     4         (object header)
      4     4         (object header)
      8     4         (object header)
     12     4         (object header)
     16     1 boolean Boolean.value
     17     7         (loss due to the next object alignment)
Instance size: 24 bytes
Space losses: 0 bytes internal + 7 bytes external = 7 bytes total
   Let's get back to String and consider the object sizes over
   the versions of Java. We are ignoring the size of the
   char[] or byte[] that contain the
   actual text.
   
Java 6 used 32 bytes, since they were storing the offset and count:
# java version "1.6.0_65"
 OFFSET  SIZE   TYPE DESCRIPTION
      0     4        (object header)
      4     4        (object header)
      8     4        (object header)
     12     4 char[] String.value
     16     4    int String.offset
     20     4    int String.count
     24     4    int String.hash
     28     4        (loss due to the next object alignment)
Instance size: 32 bytes
Space losses: 0 bytes internal + 4 bytes external = 4 bytes total
   (Incidentally, when the cached hash was added to
   String in Java 1.3, most JVMs were 32-bit and the object
   header was just 8 bytes. In those days, the extra
   hash field fitted into the wasted space. Another
   interesting factoid from 2001 - in those days every field took
   at least 4 bytes, even boolean and byte. That changed in Java
   1.4. Enough ancient history!)
   
   Java 7 decreases this to 24 bytes. The hash32 field
   was an optimization to reduce DOS attacks on hash maps.
   It was "free" in terms of memory usage, since without that
   we would have had 4 unused bytes anyway.
   
# openjdk version "1.7.0_252"
java.lang.String object internals:
 OFFSET  SIZE   TYPE DESCRIPTION
      0     4        (object header)
      4     4        (object header)
      8     4        (object header)
     12     4 char[] String.value
     16     4    int String.hash
     20     4    int String.hash32
Instance size: 24 bytes
Space losses: 0 bytes internal + 0 bytes external = 0 bytes total
   Java 8 gets rid of the hash32 field,
   which they replaced with a generalized solution inside
   java.util.HashMap. This did not save any memory in String,
   since those 4 bytes are now "wasted" due to the next object
   alignment.
   
# openjdk version "1.8.0_242"
java.lang.String object internals:
 OFFSET  SIZE   TYPE DESCRIPTION
      0     4        (object header)
      4     4        (object header)
      8     4        (object header)
     12     4 char[] String.value
     16     4    int String.hash
     20     4        (loss due to the next object alignment)
Instance size: 24 bytes
Space losses: 0 bytes internal + 4 bytes external = 4 bytes total
   Java 9 changed the array type to byte[]
   and added a coder. However, the String object
   still uses 24 bytes, with 3 lost due to object alignment.
   
# java version "9.0.4" build 9.0.4+11
java.lang.String object internals:
 OFFSET  SIZE   TYPE DESCRIPTION
      0     4        (object header)
      4     4        (object header)
      8     4        (object header)
     12     4 byte[] String.value
     16     4    int String.hash
     20     1   byte String.coder
     21     3        (loss due to the next object alignment)
Instance size: 24 bytes
Space losses: 0 bytes internal + 3 bytes external = 3 bytes total
   Java 13 added the hashIsZero boolean field,
   which in Java uses 1 byte. However, we still do not use any
   additional memory. Thus, as stated in the abstract, adding
   this new field did not cost any additional memory.
   
# openjdk version "13.0.2" 2020-01-14 build 13.0.2+8
java.lang.String object internals:
 OFFSET  SIZE    TYPE DESCRIPTION
      0     4         (object header)
      4     4         (object header)
      8     4         (object header)
     12     4  byte[] String.value
     16     4     int String.hash
     20     1    byte String.coder
     21     1 boolean String.hashIsZero
     22     2         (loss due to the next object alignment)
Instance size: 24 bytes
Space losses: 0 bytes internal + 2 bytes external = 2 bytes total
When I ran the test in Java 15, I noticed a slight change in the object layout:
# openjdk version "15-ea" 2020-09-15 - build 15-ea+20-899
java.lang.String object internals:
 OFFSET  SIZE    TYPE DESCRIPTION
      0     4         (object header)
      4     4         (object header)
      8     4         (object header)
     12     4     int String.hash
     16     1    byte String.coder
     17     1 boolean String.hashIsZero
     18     2         (alignment/padding gap)
     20     4  byte[] String.value
Instance size: 24 bytes
Space losses: 2 bytes internal + 0 bytes external = 2 bytes total
After some searching, I found Shipilev's "Java Objects Inside Out" article that includes a link to an enhancement added to Java 15. Since Java 15, the field layout is a bit different and they can pack fields across class hierarchies. This has a whole bunch of implications for high performance Java. I would encourage you to read Shipilev's article.
Kind regards from Crete
Heinz
We are always happy to receive comments from our readers. Feel free to send me a comment via email or discuss the newsletter in our JavaSpecialists Slack Channel (Get an invite here)
We deliver relevant courses, by top Java developers to produce more resourceful and efficient programmers within their organisations.