Running on Java 17+35-2724 (Preview)
Home of The JavaSpecialists' Newsletter

294String.format() 3x faster in Java 17

Author: Dr Heinz M. KabutzDate: 2021-10-29Java Version: 17Category: Performance
 

Abstract: One of the most convenient ways of constructing complex Strings is with String.format(). It used to be excessively slow, but in Java 17 is about 3x faster. In this newsletter we discover what the difference is and where it will help you. Also when you should use format() instead of the plain String addition with +.

 

Welcome to the 294th edition of The Java(tm) Specialists' Newsletter. We had a lovely run in the rain today, followed by a dip in the sea, clocking in at 21.6 degrees celsius. That is bathwater for someone from Bantry Bay! I remember the water in Cape Town being so cold that our breath misted as my brother and I contemplated how crazy we were to spearfish in single-digit water temperatures - and that was in summer.

javaspecialists.teachable.com: Please visit our new self-study course catalog to see how you can upskill your Java knowledge.

String.format() 3x faster in Java 17

A few years ago, my friend Dmitry Vyazelenko and I submitted a talk to JavaOne, where we spoke for about an hour about the humble java.lang.String. We have since spoken about this fundamental class at Devoxx, Geecon, Geekout, JAX, Voxxed Days, GOTO, and various JUGs around the world. Who would have thought that we could easily fill an hour with a talk about java.lang.String?

I would usually start the talk by showing a quiz. Which method is the fastest at appending Strings?

public class StringAppendingQuiz {
  public String appendPlain(String question,
                            String answer1,
                            String answer2) {
    return "<h1>" + question + "</h1><ol><li>" + answer1 +
        "</li><li>" + answer2 + "</li></ol>";
  }

  public String appendStringBuilder(String question,
                                    String answer1,
                                    String answer2) {
    return new StringBuilder().append("<h1>").append(question)
        .append("</h1><ol><li>").append(answer1)
        .append("</li><li>").append(answer2)
        .append("</li></ol>").toString();
  }

  public String appendStringBuilderSize(String question,
                                        String answer1,
                                        String answer2) {
    int len = 36 + question.length() + answer1.length() +
        answer2.length();
    return new StringBuilder(len).append("<h1>").append(question)
        .append("</h1><ol><li>").append(answer1)
        .append("</li><li>").append(answer2)
        .append("</li></ol>").toString();
  }
}
  

The audience is encouraged to choose between the three options, appendPlain, appendStringBuilder, and appendStringBuilderSize. Most are torn between the plain and the sized version. But it is a trick question. For such a simple case of appending plain Strings together, the performance is equivalent, whether we use plain + or the StringBuilder, pre-sized or not. However, this changes when we append mixed types, such as some long values and Strings. In that case the pre-sized StringBuilder is the fastest up until Java 8, and from Java 9 onwards, the plain + is fastest.

In comparison, we showed that using String.format was many factors slower. For example, in Java 8, a correctly sized StringBuilder with append completed 17x faster than an equivalent String.format(), whereas in Java 11, the plain + was 39x faster than format(). Despite such huge differences, our recommendation at the end of the talk was the following:

Concatenate using String.format()

  • Simpler to read and maintain
  • For performance critical, use + for now
  • In loops still use StringBuilder.append()

In a way it was a hard sell. Why would a programmer knowingly do something that was 40x slower?

The caveat was that the engineers at Oracle knew that String.format() was slow and were working on improving it. We even found a version of Project Amber that compiled the format() code to be the same speed as the plain + operator.

When Java 17 was released, I decided to re-run all our talk benchmarks. It seemed like a waste of time when I started. After all, the benchmarks were already done. Why run them again? For one, the machine that we had originally used was decommissioned, and I wanted to see consistent results throughout the talk by running everything on my performance testing machine. For another, I wanted to see whether there were any changes in the JVM that would affect the results. I did not expect the latter to be a factor.

Imagine my surprise when I noticed that the String.format() had drastically improved. Instead of the 2170 ns/op in Java 11, it now took "only" 705 ns/op. Thus instead of being about 40x slower than the plain + the String.format() was only 12 times slower. Or seen from another perspective, Java 17 String.format() is 3x faster than Java 16.

This is excellent news, but under what circumstances is it faster? I shared my discovery with Dmitry Vyazelenko, and he pointed me to some work by Claes Redestad in JDK-8263038 : Optimize String.format for simple specifiers. The actual code is available in GitHub OpenJDK.

Claes was kind enough to respond to my query and confirmed that we can expect the formatting to be faster for simple specifiers. In other words, the percentage sign % followed by a single letter in the range "bBcCtTfdgGhHaAxXno%eEsS". If they have any further formatting, such as width, precision or justification, then it might not necessarily be faster.

How does this magic work? Every time we call for e.g. String.format("%s, %d%n", name, age), the String "%s, %d%n" has to be parsed. This is done in the java.util.Formatter#parse() method, which used the following regex to break up the formatting elements:

// %[argument_index$][flags][width][.precision][t]conversion
private static final String formatSpecifier
    = "%(\\d+\\$)?([-#+ 0,(\\<]*)?(\\d+)?(\\.\\d+)?([tT])?([a-zA-Z%])";

private static final Pattern fsPattern = Pattern.compile(formatSpecifier);
  

In the pre-17 code, parse() would always start by applying the regex to the format String. However, in Java 17, we instead try to parse the format String manually. If all the FormatSpecifiers are "simple", then we can avoid the regex parsing. If we do find one that is not simple, then it parses from then onwards. This speeds up the parsing by a factor of 3 for simple format Strings. Here is a test program where I parse the following Strings:

// should be faster
"1. this does not have any percentages at all"
// should be faster
"2. this %s has only a simple field"
// might be slower
"3. this has a simple field %s and then a complex %-20s"
// no idea
"4. %s %1s %2s %3s %4s %5s %10s %22s"
  

We pass these Strings to the private Formatter#parse method using MethodHandles and measure how long it takes in Java 16 and 17.

With Java 16, we got the following results on our test server:

Best results:
1. this does not have any percentages at all
	137ms
2. this %s has only a simple field
	288ms
3. this has a simple field %s and then a complex %-20s
	487ms
4. %s %1s %2s %3s %4s %5s %10s %22s
	1557ms
  

With Java 17, we got the following results:

Best results:
1. this does not have any percentages at all
	21ms     // 6.5x faster
2. this %s has only a simple field
	32ms     // 9x faster
3. this has a simple field %s and then a complex %-20s
	235ms    // 2x faster
4. %s %1s %2s %3s %4s %5s %10s %22s
	1388ms   // 1.12x faster
  

We can thus expect a big difference with format strings that have simple fields, which would constitute the vast majority of cases. Well done to Claes Redestad for putting in the effort to make this faster. I'm going to stick with my advice to use String.format(), or even better, the relatively new formatted() method, and let the JDK developers speed it up for us.

Here is the test code in case you'd like to try it yourself. We use the following JVM parameters: -showversion --add-opens java.base/java.util=ALL-UNNAMED -Xmx12g -Xms12g -XX:+UseParallelGC -XX:+AlwaysPreTouch-verbose:gc

import java.lang.invoke.*;
import java.util.*;
import java.util.concurrent.*;
import java.util.concurrent.atomic.*;

// run with
// -showversion --add-opens java.base/java.util=ALL-UNNAMED
// -Xmx12g -Xms12g -XX:+UseParallelGC -XX:+AlwaysPreTouch
// -verbose:gc
public class MixedAppendParsePerformanceDemo {
  private static final Map<String, LongAccumulator> bestResults =
      new ConcurrentSkipListMap<>();

  public static void main(String... args) {
    String[] formats = {
        // should be faster
        "1. this does not have any percentages at all",
        // should be faster
        "2. this %s has only a simple field",
        // might be slower
        "3. this has a simple field %s and then a complex %-20s",
        // no idea
        "4. %s %1s %2s %3s %4s %5s %10s %22s",
    };

    System.out.println("Warmup:");
    run(formats, 5);
    System.out.println();

    bestResults.clear();

    System.out.println("Run:");
    run(formats, 10);
    System.out.println();

    System.out.println("Best results:");
    bestResults.forEach((format, best) ->
        System.out.printf("%s%n\t%dms%n", format,
            best.longValue()));
  }

  private static void run(String[] formats, int runs) {
    for (int i = 0; i < runs; i++) {
      for (String format : formats) {
        Formatter formatter = new Formatter();
        test(formatter, format);
      }
      System.gc();
      System.out.println();
    }
  }

  private static void test(Formatter formatter, String format) {
    System.out.println(format);
    long time = System.nanoTime();
    try {
      for (int i = 0; i < 1_000_000; i++) {
        parseMH.invoke(formatter, format);
      }
    } catch (Throwable throwable) {
      throw new AssertionError(throwable);
    } finally {
      time = System.nanoTime() - time;
      bestResults.computeIfAbsent(format, key ->
              new LongAccumulator(Long::min, Long.MAX_VALUE))
          .accumulate(time / 1_000_000);
      System.out.printf("\t%dms%n", (time / 1_000_000));
    }
  }

  private static final MethodHandle parseMH;

  static {
    try {
      parseMH = MethodHandles.privateLookupIn(Formatter.class,
              MethodHandles.lookup())
          .findVirtual(Formatter.class, "parse",
              MethodType.methodType(List.class, String.class));
    } catch (ReflectiveOperationException e) {
      throw new Error(e);
    }
  }
}
  

There's more good news coming about performance improvements in Java 17.

Kind regards

Heinz

 

Comments

We are always happy to receive comments from our readers. Feel free to send me a comment via email or discuss the newsletter in our JavaSpecialists Slack Channel (Get an invite here)

When you load these comments, you'll be connected to Disqus. Privacy Statement.

Related Articles

Browse the Newsletter Archive

About the Author

Heinz Kabutz Java Conference Speaker

Java Champion, author of the Javaspecialists Newsletter, conference speaking regular... About Heinz

Superpack 21

Superpack 21 Our entire Java Specialists Training in one huge bundle more...

Free Java Course

Free Juppies 2 Course
Juppies 2 - a course for complete beginners more...

Free Java Book

Dynamic Proxies in Java Book
Java Training

We deliver relevant courses, by top Java developers to produce more resourceful and efficient programmers within their organisations.

Java Consulting

We can help make your Java application run faster and trouble-shoot concurrency and performance bugs...

Java Emergency?

If your system is down, we will review it for 15 minutes and give you our findings for just 1 € without any obligation.