Java Specialists' Java Training Europehome of the java specialists' newsletter

The Java Specialists' Newsletter
Issue 1832010-05-31 Category: Performance Java version: 1.2, 1.3, 1.4, 1.5, 6+

GitHub Subscribe Free RSS Feed

Serialization Size of Lists

by Dr. Heinz M. Kabutz
Abstract:
What has a larger serializable size, ArrayList or LinkedList? In this newsletter we examine what the difference is and also why Vector is a poor candidate for a list in a serializable class.

Welcome to the 183rd edition of The Java(tm) Specialists' Newsletter, sent to you from Chania on the beautiful island of Crete. Last week we ran our Java Specialist Master Course at our new conference center in Crete. One of our students enjoyed it so much that he decided to blog about his experiences (unsolicited).

We recently started a Java Specialists Club for those who want to learn more about Java. It is like a gym membership for your mind. Please have a look at http://www.javaspecialists.eu/club for information on how to join. Over the next few months we will run 20 sessions on Design Patterns in Java. Club members are allowed free entrance to these sessions. Please have a look at http://www.javaspecialists.eu/club/dpc.jsp for more information.

NEW: Please see our new "Extreme Java" course, combining concurrency, a little bit of performance and Java 8. Extreme Java - Concurrency & Performance for Java 8.

Serialization Size of Lists

Hopefully we all have some idea of how the ArrayList and LinkedList are structured internally. The ArrayList holds a pointer to a single Object array, which grows as the number of elements exceed the size of the array. The ArrayList's underlying Object array grows by about 50% whenever we run out of space. The LinkedList has pointers to the link nodes at the front and end of the list. Each link node is an object and has pointers forward and backwards and to the object that is being held in the list. The memory requirements of the LinkedList is about 4x larger than an equivalently sized ArrayList. In addition, the ArrayList allows random access into the middle, whereas the LinkedList has a lookup complexity of O(n).

One of the questions I enjoy asking in my Java Specialist Master Course is this: Which list uses more bytes when serialized? LinkedList or ArrayList? Think about this for a bit before reading on, considering the internal structure of each list.

Before I show the answer, I'd like to go back about 8.5 years, when I described memory usage in Java. I had chosen an experimental approach to figuring out how many bytes are occupied by Java objects. At the time, one of my readers told me he measured object size by serializing Java objects to a byte array. I argued that the result would differ completely from the bytes in memory. This newsletter shows an example of how different the results may be.

Most programmers guess that the LinkedList would have a larger serialized size than the ArrayList. However, that is never the case:

import java.io.*;
import java.util.*;

public class ListWritingSize {
  public static void main(String[] args) throws IOException {
    test(new LinkedList<String>());
    test(new ArrayList<String>());
  }

  public static void test(List<String> list) throws IOException {
    for (int i = 0; i < 10; i++) {
      list.add("hello world");
    }
    ByteArrayOutputStream baos = new ByteArrayOutputStream();
    ObjectOutputStream out = new ObjectOutputStream(baos);
    out.writeObject(list);
    out.close();
    System.out.println(list.getClass().getSimpleName() +
        " used " + baos.toByteArray().length + " bytes");
  }
}

When we run this, we see that the LinkedList uses 107 bytes, whilst the ArrayList uses 117 bytes. Instead of just naively writing the contents of the collections to the stream, the lists both contain writeObject() and readObject() methods. These write the contents of the lists to the stream.

Why the 10 byte difference?

The ArrayList and LinkedList work similarly, in that they write out the number of elements and the actual objects contained in the list. However, ArrayList also writes out the size of the underlying array, used to recreate an identical ArrayList to what was serialized.

This additional int is what makes up the 10 byte difference in the serialization size.

At this point I need to also point out that the String "hello world" is only serialized once, after that only a reference number to the object is written.

What about Vector?

The old Vector class implements serialization in a naive way. They simply do the default serialization, which writes the entire Object[] as-is into the stream. Thus if we insert a bunch of elements into the List, then clear it, the difference between Vector and ArrayList is enormous.

import java.util.*;
import java.io.*;

public class VectorWritingSize {
  public static void main(String[] args) throws IOException {
    test(new LinkedList<String>());
    test(new ArrayList<String>());
    test(new Vector<String>());
  }

  public static void test(List<String> list) throws IOException {
    insertJunk(list);
    for (int i = 0; i < 10; i++) {
      list.add("hello world");
    }
    ByteArrayOutputStream baos = new ByteArrayOutputStream();
    ObjectOutputStream out = new ObjectOutputStream(baos);
    out.writeObject(list);
    out.close();
    System.out.println(list.getClass().getSimpleName() +
        " used " + baos.toByteArray().length + " bytes");
  }

  private static void insertJunk(List<String> list) {
    for(int i = 0; i<1000 * 1000; i++) {
      list.add("junk");
    }
    list.clear();
  }
}
  

When we run this code, we get the following output:

    LinkedList used 107 bytes
    ArrayList used 117 bytes
    Vector used 1310926 bytes
  

Vector can use a staggering amount of bytes when being serialized. The lesson here? Don't ever use Vector as Lists in objects that are Serializable. The potential for disaster is too great.

Last Puzzle

Here is a question that I puzzled about for a bit. Why does Vector have a writeObject() method that simply does the defaultWriteObject()? It does not have a readObject().

Kind regards

Heinz

Performance Articles Related Java Course

Extreme Java - Concurrency and Performance for Java 8
Extreme Java - Advanced Topics for Java 8
Design Patterns
In-House Courses

© 2010-2016 Heinz Kabutz - All Rights Reserved Sitemap
Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. JavaSpecialists.eu is not connected to Oracle, Inc. and is not sponsored by Oracle, Inc.