Home of The JavaSpecialists' Newsletter

**Abstract:** BigInteger has new algorithms for multiplying and dividing large numbers that have a better computational complexity than previous versions of Java. A further improvement would be to parallelize multiply() with Fork/Join.

Welcome to the 236th edition of **The Java(tm) Specialists' Newsletter**, written in the
Aegean Airlines Lounge in Athens on my way home after
renewing my South African passport. Whilst speaking to the
staff at the embassy, I discovered that Marthinus van
Schalkwyk has recently been appointed as the South African
Ambassador to Greece. Not sure what to say about that. Van
Schalkwyk became leader of the National Party in South Africa
after Frederik Willem de Klerk retired. The National Party
is the one that banned the ANC long before I was born. He
then changed the name to "New National Party", a bit like
java.nio (New IO). They then merged with their previous
opposition the Democratic Party and formed yet another party,
the Democratic Alliance. Shortly after some disastrous local
elections, the New National Party guys left the party (so to
speak) and joined the ANC, which they had previously banned.
It was at this point that I gave up trying to understand
politics. Good thing too, seeing how we have a coalition
government in Greece consisting of an extreme left and an
extreme right party. How that marriage has lasted over a year
is anyone's guess. At least I get to exercise my right to
vote in Greece with regularity.

A special welcome to a new subscriber country - Georgia!
We have several Georgians living in Chorafakia and it's great
to now also be sending **The Java(tm) Specialists' Newsletter** to their home country :)

**Learning.JavaSpecialists.EU:** Please visit our
new self-study course catalog to see how you can upskill
your Java knowledge.

Whilst randomly looking through Java code a few months ago, I
noticed that BigInteger had been improved. In previous
versions, multiply worked the way your elementary school
teacher used to torture you: O(n^{2}). In an earlier newsletter, I mentioned
how Karatsuba was a better algorithm to use for larger
numbers. It weighs in at O(n^{lg 3}) or approximately
O(n^{1.585}). Another algorithm is 3-way Toom Cook, which is
O(n^{1.465}). Whilst these look very similar, the difference
is huge as n grows. Here is a small program that
demonstrates it:

publicclassBigOComparison {publicstaticvoidmain(String... args) {for(intn = 1; n <= 1_000_000_000; n *= 10) {doublen_2 = Math.pow(n, 2);doublen_1_585 = Math.pow(n, 1.585);doublen_1_465 = Math.pow(n, 1.465);doublekaratsuba_faster_than_quadratic = n_2 / n_1_585;doubletoom_cook_faster_than_karatsuba = n_1_585 / n_1_465; System.out.printf("%d\t%.2fx\t%.2fx%n", n, karatsuba_faster_than_quadratic, toom_cook_faster_than_karatsuba); } } }

We can see that as n increases in size, the difference
between the computational complexities becomes more apparent. At
n=1_000_000_000, n^{1.585} would be over 5000 faster than n^{2}.
n^{1.465} would be another 12 times faster than that:

1 1.00x 1.00x 10 2.60x 1.32x 100 6.76x 1.74x 1000 17.58x 2.29x 10000 45.71x 3.02x 100000 118.85x 3.98x 1000000 309.03x 5.25x 10000000 803.53x 6.92x 100000000 2089.30x 9.12x 1000000000 5432.50x 12.02x

Of course there are setup costs that are ignored with Big O notation. Because of these, we would only want to use Karatsuba for large numbers and Toom Cook when they are even bigger.

Java 8 now has these two algorithms built in to BigInteger. To see the improvement in performance, here is Fibonacci using Dijkstra's sum of squares algorithm:

importjava.math.*;publicclassFibonacci {publicBigInteger f(intn) {if(n == 0)returnBigInteger.ZERO;if(n == 1)returnBigInteger.ONE;if(n % 2 == 1) {// F(2n-1) = F(n-1)^2 + F(n)^2n = (n + 1) / 2; BigInteger fn_1 = f(n - 1); BigInteger fn = f(n);returnfn_1.multiply(fn_1).add(fn.multiply(fn)); }else{// F(2n) = ( 2 F(n-1) + F(n) ) F(n)n = n / 2; BigInteger fn_1 = f(n - 1); BigInteger fn = f(n);returnfn_1.shiftLeft(1).add(fn).multiply(fn); } }publicBigInteger f_slow(intn) {if(n == 0)returnBigInteger.ZERO;if(n == 1)returnBigInteger.ONE;returnf_slow(n - 1).add(f_slow(n - 2)); } }

The f_slow() method is only there to help us test our fast algorithm, but would not be useful beyond about n=30.

Here is a test class that we can run in Java 7 and 8 to see how the reduced computational complexity in the multiply() algorithm speeds things up in Java 8:

publicclassFibonacciTest {publicstaticvoidmain(String... args) { Fibonacci fib =newFibonacci();for(inti = 0; i < 10; i++) {if(!fib.f(i).equals(fib.f_slow(i))) {thrownewAssertionError("Mismatch at i="+ i); } }for(intn = 10000; n < 50_000_000; n *= 2) { timeTest(fib, n); } }privatestaticvoidtimeTest(Fibonacci fib,intn) { System.out.printf("fib(%,d)%n", n);longtime = System.currentTimeMillis(); System.out.println(fib.f(n).bitLength()); time = System.currentTimeMillis() - time; System.out.println("time = "+ time); } }

Here is the output for Java 7:

heinz$java -showversion FibonacciTestjava version "1.7.0_80" Java(TM) SE Runtime Environment (build 1.7.0_80-b15) Java HotSpot(TM) 64-Bit Server VM (build 24.80-b11, mixed mode) fib(10,000) 6942 time = 14 fib(20,000) 13884 time = 10 fib(40,000) 27769 time = 11 fib(80,000) 55539 time = 23 fib(160,000) 111078 time = 51 fib(320,000) 222157 time = 108 fib(640,000) 444314 time = 181 fib(1,280,000) 888629 time = 590 fib(2,560,000) 1777259 time = 2530 fib(5,120,000) 3554518 time = 8785 fib(10,240,000) 7109037 time = 34603 fib(20,480,000) 14218074 time = 142635 fib(40,960,000) 28436148 time = 586950

You can see that as the value of n doubles, the time it takes roughly quadruples. You can also see by the number of bits that the results get rather large.

And now Java 8. For smaller numbers, we don't see much difference to Java 7, but we start to diverge at roughly fib(1,280,000). Java 8 calculates fib(40,960,000) about 50x faster. I wasn't patient enough to calculate larger numbers, since I have a flight to catch this afternoon :-) However, I would expect the next data point to be roughly 75x faster.

heinz$java -showversion FibonacciTestjava version "1.8.0_74" Java(TM) SE Runtime Environment (build 1.8.0_74-b02) Java HotSpot(TM) 64-Bit Server VM (build 25.74-b02, mixed mode) fib(10,000) 6942 time = 6 fib(20,000) 13884 time = 3 fib(40,000) 27769 time = 7 fib(80,000) 55539 time = 16 fib(160,000) 111078 time = 27 fib(320,000) 222157 time = 40 fib(640,000) 444314 time = 58 fib(1,280,000) 888629 time = 155 fib(2,560,000) 1777259 time = 324 fib(5,120,000) 3554518 time = 734 fib(10,240,000) 7109037 time = 1661 fib(20,480,000) 14218074 time = 4412 fib(40,960,000) 28436148 time = 11870

So, now you have seen that Java 8 has improved in at least one area. However, they did not go far enough in my opinion. Both Karatsuba and Toom-Cook can easily be parallelized with recursive decomposition. If you really want to work with such large numbers then you probably also want to throw hardware at your problem. I tried it out by modifying BigInteger and adding a little MultiplyTask:

privatestaticfinalclassMultiplyTaskextendsRecursiveTask<BigInteger> {privatefinalBigInteger b1, b2;publicMultiplyTask(BigInteger b1, BigInteger b2) {this.b1 = b1;this.b2 = b2; }protectedBigInteger compute() {returnb1.multiply(b2); } }

And then inside the multiplyKaratsuba() method I changed

BigInteger p1 = xh.multiply(yh);// p1 = xh*yhBigInteger p2 = xl.multiply(yl);// p2 = xl*yl

To instead do this:

MultiplyTask mt1 = new MultiplyTask(xh, yh); mt1.fork(); BigInteger p2 = xl.multiply(yl);// p2 = xl*ylBigInteger p1 = mt1.join();//xh.multiply(yh); // p1 = xh*yh

By default my code uses the common Fork/Join Pool, but it
would be marvelous to add a new method to BigInteger that
allows us to multiply in parallel, for example:
`BigInteger.multiply(BigInteger, ForkJoinPool)`

or more explicitely
`BigInteger.multiplyParallel(BigInteger, ForkJoinPool)`

.

My modifications to BigInteger worked fairly well. I also used ManagedBlocker to implement a "reserved caching scheme" in Fibonacci to avoid calculating the same number twice. ManagedBlocker worked very nicely and kept my cores more busy.

Here is a tweet where I show my parallel solution without using the ManagedBlocker. Notice how idle my cores are, especially at the beginning of the run when the numbers I am multiplying are small. And another tweet with the same code, but with my "reserved caching scheme" using the ManagedBlocker to keep the ForkJoinPool more lively. fib(1_000_000_000) finished 8% faster and as you can see from my CPU graph, utilized the available hardware much better. I talk about these types of concepts in my Extreme Java - Concurrency & Performance for Java 8 Course in case you'd like to learn how to do it yourself.

I hope you enjoyed this newsletter and that it was useful to you.

Kind regards from sunny and warm Greece

Heinz

We deliver relevant courses, by top Java developers to produce more resourceful and efficient programmers within their organisations.

Nobody ever wants to call a Java performance consultant, but with first-hand experience repairing and improving commercial Java applications - JavaSpecialists are a good place to start...