Running on Java 27-ea+22-2010 (Preview)
Home of The JavaSpecialists' Newsletter

335ZGC Mysteries

Author: Dr Heinz M. KabutzDate: 2026-05-31Java Version: 25Sources on GitHubCategory: Performance
 

Abstract: In our previous newsletter we posted a puzzle about ArrayList creation in ZGC. Lots of Java programmers sent in their ideas, and we are still looking for the perfect answer. Here are some more thoughts on the idea, whilst we wait for The Z Garbage Collector Book to arrive.

 

Welcome to the 335th edition of The Java(tm) Specialists' Newsletter. The best season on Crete, spring, is sadly coming to an end tomorrow and we are entering the hot summer months. Yes, I know that the astronomical summer starts on the summer solstice, but the metereological summer begins tomorrow. *sighs*

In 2017, we did a 15k push-up challenge, where we had to do 15000 in a period of 5 months. It was a lot of fun running into buff Java programmers all over the world. We did another less extreme push-up challenge in 2021, and I kept it up until March this year, doing them every day and accumulating 173793. I took a break for ten weeks to let a spasm in my psoas major muscle recover, and will start again tomorrow morning. Or the day after. Or maybe in July ...

I've also been toying with the idea of running an intermittent fasting challenge, starting with 13 hours and then after a month ramping it up to 18 hours. It would be tied to a charity drive, so that we can also raise funds for a worthy cause. However, Helene pointed out that the average Java programmer does not need to lose weight. Build muscle, yes. Plus diet is very personal and is best advised by a good professional dietician. So as fun as that idea sounded to me, it's been mothballed.

javaspecialists.teachable.com: Please visit our new self-study course catalog to see how you can upskill your Java knowledge.

ZGC Mysteries

Back in January, our wonderful Sharat Chander mentioned a new book The Z Garbage Collector by Erik Österlund [ISBN 1032977728] . I immediately pre-ordered the paperback version, and also offered to do a technical review of the book. Unfortunately, the book was already in its final stages, so Erik graciously declined my attempt to score a sneak preview. Besides, my knowledge of Z is basically coloured pointers bla bla, extremely low latency, use as default. It's been my collector of choice for JavaSpecialists.eu for years.

Last month's puzzle produced some excellent ideas and discussion. Here are the people who responded, in alphabetical order: Aayush Mishra, Anli Shundi, Antoine Dessaigne, Artur Wojciechowski, Changsheng Yang, Erik Osterlund, Eugene Rabii, Fabrice Bibonne, Franklin Nwankwo, Jaromir Hamala, Kirk Pepperdine, Krzysztof Koziol, Lenny Primak, Robert Crida, Stefanos Karasavvidis, Ulrich Grepel, Vijay Lakshminarayanan, and Vivek Ravi. Thank you so much for taking the time to investigate this weird phenomenon.

I don't have a perfect solution to the puzzle, but I do have some ideas that will hopefully add some perspective. First off, as I tried to hint in the previous newsletter, the benchmark was fundamentally broken. If we run an experiment just once, we are unlikely to get meaningful results in Java. We need to let the code cache warm up by running the experiment several times, and then run it for a while. This is what JMH is good at.

The first obvious problem with the code, and which almost everyone found, is that ArrayList starts with a minimally sized array, and then expands that by 50% whenever it runs out of space. When we create a new ArrayList, it makes it with a zero length shared array. Once we add a single element, it replaces that with an array of size 10. That would be the first array creation. It then increases that by 50% to 15, then to 22, 33, 49, etc. We can see how many times it increases to get to 100m, and how many arrays it creates, here:

public class ArrayResizingEvents {
    public void main() {
        var targetSize = 100_000_000;
        var size = 10;
        var resizes = 0;
        var junk = 0;
        while (size < targetSize) {
            junk += size;
            size *= 1.5;
            resizes++;
        }
        System.out.println("final size = " + size);
        System.out.println("resizes = " + resizes);
        System.out.println("junk = " + junk);
    }
}

Output is the following:

final size = 105136605
resizes = 40
junk = 210273213

In this case, we make 40 arrays in increasing size, fill them, then throw them away and copy the elements to the new arrays. If we pre-size the ArrayList to 100m, then it beats the pants off LinkedList in all tests.

However, there are a couple of things that no one mentioned. First off, it appears from my experiments that for ZGC, a reference will always take 8 bytes. With all the other GCs, as long as the heap size is less than 32GB, the references are by default compressed to 4 byte pointers.

We can verify this with -XX:+PrintFlagsFinal:

    java -XX:+PrintFlagsFinal -XX:+UseCompressedOops -XX:+UseG1GC -version | grep Oops
   ### bool UseCompressedOops = true {product lp64_product} {command line}
java -XX:+PrintFlagsFinal -XX:+UseCompressedOops -XX:+UseZGC -version | grep Oops
   ### bool UseCompressedOops = false {product lp64_product} {command line}

LinkedList also suffers from each node needing more memory, but in ArrayList's case, we are allocating much larger arrays.

Secondly, on my machine, I have 96GB of memory. By default, the JVM would get 1/4 of that, thus 24GB. To add 100m elements to an unsized ArrayList would use 1.2GB with compressed oops and 2.3GB without. Also, most of the arrays in terms of total bytes would qualify as humongous objects under G1 and large objects under Z.

Thirdly, I mentioned before that we didn't warmup the JVM properly. I thus introduced a new BetterListPuzzle, which by default does 10 warmup runs and then 30 actual runs. We need to determine whether ArrayList really is in fact slower than LinkedList, once we've warmed up the JVM.

import java.lang.management.ManagementFactory;
import java.util.*;
import java.util.function.*;

public abstract class BetterListPuzzle extends ListPuzzle {
    public int warmups() { return 10; }
    public int repeats() { return 30; }
    public void run(
            Supplier<? extends List<Integer>> supplier) {
        System.out.println("Warmup: 10x");
        for (var w = 0; w < warmups(); w++) {
            super.run(supplier);
        }
        System.out.println("Actual runs: 30x");
        var tmb = ManagementFactory.getThreadMXBean();
        var tim = System.nanoTime();
        var cpu = tmb.getCurrentThreadCpuTime();
        var usr = tmb.getCurrentThreadUserTime();
        try {
            for (var r = 0; r < repeats(); r++) {
                super.run(supplier);
            }
        } finally {
            tim = System.nanoTime() - tim;
            usr = tmb.getCurrentThreadUserTime() - usr;
            cpu = tmb.getCurrentThreadCpuTime() - cpu;
            System.out.println("Total times:");
            System.out.printf("tim=%d,", (tim / 1000000));
            System.out.printf("cpu=%d,", (cpu / 1000000));
            System.out.printf("usr=%d%n", (usr / 1000000));
        }
    }
}

Since I want to also run this in older versions of ZGC, such as Java 17 and 21, I'm going to change the test classes to use the traditional public static void main(String... args). I've also included a test for a pre-sized ArrayList.

public class BetterArrayListPuzzle extends BetterListPuzzle {
    public static void main(String... args) {
        new BetterArrayListPuzzle()
                .run(java.util.ArrayList::new);
    }
}
public class BetterLinkedListPuzzle extends BetterListPuzzle {
    public static void main(String... args) {
        new BetterLinkedListPuzzle()
                .run(java.util.LinkedList::new);
    }
}
public class BetterArrayListPreSizedPuzzle extends
        BetterListPuzzle {
    public static void main(String... args) {
        new BetterArrayListPreSizedPuzzle()
                .run(() -> new java.util.ArrayList<>(
                        100_000_000));
    }
}

Now we can run this in Java 17, 21 and 25, using all the various garbage collectors. Here is the total time taken by 30 iterations on ZGC in Java 25:

class java.util.LinkedList:
tim=64172,cpu=5209,usr=4961
class java.util.ArrayList:
tim=14047,cpu=14035,usr=13356

We see that in terms of user and cpu time, LinkedList is 2.7x faster than an unsized ArrayList, but in elapsed time, it is 4.6x slower. LinkedList used approximately 160ms of cpu/user time per run, but the elapsed time twice shot past 11 seconds. This is something we have to be aware of with ZGC. If it cannot keep up, we can occasionally see very long pauses. Unsized ArrayList had a worst case elapsed time of 1 second, but what was curious about that is that the cpu and user time were about the same. Thus it wasn't a long GC pause like with the LinkedList. For the other runs, it took about 400ms. That is consistent with the overall total result.

If we also add the pre-sized ArrayList into the experiment, we see the following output:

    class java.util.LinkedList:
tim=64172,cpu=5209,usr=4961
class java.util.ArrayList:
tim=14047,cpu=14035,usr=13356
class java.util.ArrayList: (pre-sized)
tim=8688,cpu=8663,usr=8450

Interestingly, the ArrayList, even pre-sized, had higher cpu time than LinkedList, but is a lot cheaper than the unsized ArrayList.

For comparison, here are the total run results for G1, Parallel, and Serial for Java 25:

G1
class java.util.LinkedList:
tim=131791,cpu=5964,usr=5694
class java.util.ArrayList:
tim=49509,cpu=32747,usr=30783
class java.util.ArrayList: (pre-sized)
tim=18437,cpu=18419,usr=17574

Parallel
class java.util.LinkedList:
tim=32176,cpu=4239,usr=4212
class java.util.ArrayList:
tim=8901,cpu=7953,usr=7611
class java.util.ArrayList: (pre-sized)
tim=7705,cpu=7659,usr=6943

Serial
class java.util.LinkedList:
tim=22051,cpu=4683,usr=4537
class java.util.ArrayList:
tim=10963,cpu=8050,usr=7702
class java.util.ArrayList: (pre-sized)
tim=6842,cpu=6827,usr=6800

I read some excellent explanations from my readers as to why ArrayList could be faster, based on profilers. Adding a new node to a LinkedList is cheap, but we pay the price later when we have to collect the nodes again. Plus, if a LinkedList lives in memory for a while, the nodes will no longer occupy a contiguous space in memory, and thus would also not be fast to navigate.

If we roll back to Java 21, we see something very interesting. Here are two results with ZGC, first the default and then generational:

Z
class java.util.LinkedList:
tim=12453,cpu=5922,usr=5555
class java.util.ArrayList:
tim=12170,cpu=12161,usr=11277
class java.util.ArrayList: (pre-sized)
tim=7333,cpu=7325,usr=6541

Z Generational
class java.util.LinkedList:
tim=37908,cpu=5579,usr=5306
class java.util.ArrayList:
tim=13328,cpu=13317,usr=12802
class java.util.ArrayList: (pre-sized)
tim=7676,cpu=7667,usr=7085

For LinkedList, the risk of long STW pauses has gone up dramatically for Z Generational. In my experiment, the elapsed time was 3x slower. We also see that the ArrayList was a bit slower, but not by much.

The most dramatic issue is if we go from Java 21 to Java 25 and we happen to have very large and complex object graphs. LinkedList in this case went from 12453ms in non-generational Java 21 to a whopping 64172ms in generational Java 25. Be careful of "Allocation Stall". We see here that ZGC immediately starts a Major Collection in the background, but if we keep on allocating new objects, it eventually does an "Allocation Stall", where all our threads are paused with stop-the-world (STW) until it has finished the collection. For my 24 GB heap of tiny node objects, this takes several seconds.

Z
[0.005s][info][gc] Using The Z Garbage Collector
Warmup: 10x
class java.util.LinkedList:
[0.349s][info][gc] GC(0) Major Collection (Warmup)
tim=555,cpu=554,usr=374
tim=555,cpu=555,usr=365
tim=524,cpu=523,usr=331
tim=527,cpu=527,usr=330
tim=532,cpu=529,usr=331
tim=522,cpu=521,usr=330
[5.171s][info][gc] GC(1) Minor Collection (Allocation Stall)
[9.123s][info][gc] Allocation Stall (main) 5642.429ms
[9.130s][info][gc] GC(1) Minor Collection (Allocation Stall)
        24576M(100%)->1638M(7%) 3.959s
[9.130s][info][gc] GC(0) Major Collection (Warmup)
        2264M(9%)->1642M(7%) 8.781s
[9.134s][info][gc] GC(2) Minor Collection (Allocation Rate)
tim=5976,cpu=322,usr=235
tim=270,cpu=263,usr=212
tim=276,cpu=262,usr=213
tim=277,cpu=264,usr=213
Actual runs: 30x
tim=268,cpu=266,usr=216
tim=267,cpu=267,usr=218
[11.666s][info][gc] Allocation Stall (main) 922.321ms
[11.950s][info][gc] GC(2) Minor Collection (Allocation Rate)
        1736M(7%)->24576M(100%) 2.816s
[11.950s][info][gc] GC(3) Minor Collection (Allocation Stall)
[16.925s][info][gc] Allocation Stall (main) 5259.733ms
[16.929s][info][gc] GC(3) Minor Collection (Allocation Stall)
        24576M(100%)->5468M(22%) 4.980s
[16.929s][info][gc] GC(4) Major Collection (Warmup)
tim=6424,cpu=239,usr=200
tim=162,cpu=162,usr=158
tim=164,cpu=164,usr=158
tim=162,cpu=162,usr=159
tim=236,cpu=236,usr=201
[22.277s][info][gc] Allocation Stall (main) 4345.753ms
[22.278s][info][gc] Allocation Stall (main) 0.135ms
[22.868s][info][gc] GC(5) Minor Collection (Allocation Stall)
[28.839s][info][gc] Allocation Stall (main) 6561.645ms
[28.842s][info][gc] GC(5) Minor Collection (Allocation Stall)
        24576M(100%)->9280M(38%) 5.973s
[28.845s][info][gc] GC(6) Minor Collection (Allocation Rate)
tim=11187,cpu=278,usr=224

When creating ArrayLists, we never encounter the "Allocation Stall". However, we do see something interesting. When a "Minor Collection" occurs, the current thread's cpu and user times also go up. See "GC(7)" in this output. I would have expected the elapsed time to be a lot higher than cpu and user. Could it be that the current thread is somehow used for the large Z page harvesting? I guess I will have to wait for The Z Garbage Collector Book [ISBN 1032977728] to arrive.

tim=393,cpu=393,usr=387
class java.util.ArrayList:
tim=390,cpu=390,usr=384
class java.util.ArrayList:
tim=440,cpu=440,usr=398
class java.util.ArrayList:
[15.150s][info][gc] GC(7) Minor Collection (Allocation Rate)
[15.408s][info][gc] GC(7) Minor Collection (Allocation Rate)
        15964M(65%)->1710M(7%) 0.258s
tim=1263,cpu=1263,usr=1162
class java.util.ArrayList:
tim=388,cpu=388,usr=383
class java.util.ArrayList:
tim=388,cpu=388,usr=382

Running the ArrayList bench in Async Profiler for two minutes shows the following top culprits:

ns  percent  samples  top
----------  -------  -------  ---
69440000000   53.78%     6944  java.util.ArrayList.add
29490000000   22.84%     2949  copy_oop_uninit_f
14560000000   11.28%     1456  ZUtils::fill
 3090000000    2.39%      309  ZMark::follow_array_elements_small
 2690000000    2.08%      269  ZBarrierSetRuntime::load_barrier_on_oop_field_preloaded
 2670000000    2.07%      267  ZBarrier::mark_from_young_slow_path
 2230000000    1.73%      223  zero_blocks
 1270000000    0.98%      127  ZBarrier::relocate_or_remap

And this is what we see with a pre-sized ArrayList:

 61490000000   43.17%     6149  java.util.ArrayList.add
 21920000000   15.39%     2192  ZBarrierSetRuntime::load_barrier_on_oop_array
 11490000000    8.07%     1149  forward_copy_longs
 11070000000    7.77%     1107  void ZMark::mark_object<true, true, false, false>
  9610000000    6.75%      961  __bzero
  5300000000    3.72%      530  void ZMark::mark_object<false, true, false, true>
  5130000000    3.60%      513  ZRelocate::relocate_object
  4800000000    3.37%      480  ZBarrier::mark_barrier_on_oop_slow_path
  4490000000    3.15%      449  ZBarrier::mark_barrier_on_oop_array
  3740000000    2.63%      374  ZBarrier::relocate_or_mark
  1540000000    1.08%      154  zero_blocks

Kind regards

Heinz

 

Comments

We are always happy to receive comments from our readers. Feel free to send me a comment via email or discuss the newsletter in our JavaSpecialists Slack Channel (Get an invite here)

When you load these comments, you'll be connected to Disqus. Privacy Statement.

Related Articles

Browse the Newsletter Archive

About the Author

Heinz Kabutz Java Conference Speaker

Java Champion, author of the Javaspecialists Newsletter, conference speaking regular... About Heinz

Superpack

Java Specialists Superpack Our entire Java Specialists Training in one huge bundle more...

Free Java Book

Dynamic Proxies in Java Book
Java Training

We deliver relevant courses, by top Java developers to produce more resourceful and efficient programmers within their organisations.

Java Consulting

We can help make your Java application run faster and trouble-shoot concurrency and performance bugs...