Running on Java 16-ea+15-631 (Preview)

# 201Fork/Join with Fibonacci and Karatsuba

##### Author: Dr. Heinz M. KabutzDate: 2012-05-24Java Version: 7Category: Concurrency

Abstract: The new Java 7 Fork/Join Framework allows us to define our algorithms using recursion and then to easily parallelize them. In this newsletter we describe how that works using a fast Fibonacci algorithm that uses the sum of the squares rather than brute force. We also present a faster algorithm for multiplying two large BigInteger numbers, using the Fork/Join Framework and the Karatsuba algorithm.

Welcome to the 201st issue of The Java(tm) Specialists' Newsletter sent to you from the Island of Crete. One of the main tourist areas, Platanias, is also a breeding ground for the loggerhead turtle, called Caretta-Caretta. Why they allowed hotels and beach umbrellas in that area boggles the mind. Last month, I took some friends out for a drive in my boat and almost ran over a turtle that was swimming in the middle of the sea. It was the first time I saw one of these critters in the wild and it was a completely amazing experience watching and filming it. My son was kind enough to turn our recordings into a short documentary about the Caretta-Caretta. Some viewers in Canada had errors when they tried to play it. If that happens, please just try again. I think you will enjoy seeing the turtle as much as I did.

### Fork/Join with Fibonacci and Karatsuba

Figuring out new language features can be daunting. The JavaDocs are not always that helpful. Here is an example shown in the RecursiveTask JavaDocs to demonstrate how the fork/join framework should be used:

```class Fibonacci extends RecursiveTask<Integer> {
final int n;
Fibonacci(int n) { this.n = n; }
Integer compute() {
if (n <= 1)
return n;
Fibonacci f1 = new Fibonacci(n - 1);
f1.fork();
Fibonacci f2 = new Fibonacci(n - 2);
return f2.compute() + f1.join();
}
}
```

What is wrong with this example? A minor issue is that it contains a fairly basic Java syntax error. The compute() method is implementing a protected abstract method defined in the RecursiveTask class. In the example, the method is package access. It should either be protected or public. We can widen the access to an overridden method, but we cannot narrow it. Stylistically also, the field "n" should be private.

However, the syntax error will be semi-automatically repaired by any decent IDE. The real problem is one of computational complexity. The algorithm for Fibonacci presented here is exponential. Solving f(n+1) requires approximately twice as many steps as f(n). Thus if we manage to solve f(n) with a single core within some time, then with a 1024 processor machine, we will only be able to solve the f(n+10) in the same amount of time.

The idea of using Fibonacci as an example is not bad, but the choice of an exponential algorithm was unfortunate. I tried to produce a fast Java implementation of the famous function. Exactly three months ago I posted a challenge on java.net to try to find Fibonacci of one billion. Alexey Solodovnikov managed to solve the problem in 36 seconds using gcc 4.5.2 + gmp with a 3 GHz single core machine. His Java version took about 2.5 hours. Rexy Young solved it using Dijkstra's famous sum of squares algorithm (more later) in 55 minutes. The code I will present in this newsletter solved the problem without any 3rd party libraries in 34 minutes in Java on my 8-core server.

Something else I have done is start uploading the code from my newsletters onto GitHub. You can grab it from git://github.com/kabutz/javaspecialists.git if you like and then build it with Maven. It will still be changing a lot until I have decided how to arrange the various packages. I also still have to pull the source code from over 200 Java newsletters into my project tree. This will take a while, but I think it's good that my friend Alvaro pushed me into this direction. Thanks Alvaro! The license will probably be Apache 2.0, unless there are any compelling reasons why that won't do.

The first class we have is Fibonacci. It contains caching behaviour enabled by default, since a lot of Fibonacci algorithms do much better with it turned on. You might notice that the calculate() and doActualCalculate() methods both declare that they throw the InterruptedException. We can use interrupts to cancel long calculations.

```import java.math.*;

public abstract class Fibonacci {
private final FibonacciCache cache;

protected Fibonacci(FibonacciCache cache) {
this.cache = cache;
}

public Fibonacci() {
this(null);
}

public BigInteger calculate(int n)
throws InterruptedException {
if (cache == null) return doActualCalculate(n);

BigInteger result = cache.get(n);
if (result == null) {
cache.put(n, result = doActualCalculate(n));
}
return result;
}

protected abstract BigInteger doActualCalculate(int n)
throws InterruptedException;
}
```

Here is an implementation that uses the caching feature of the Fibonacci class. The algorithm is recursive, but thanks to the cache, we can do all the calculations in linear time. However, the BigInteger.add() method is also linear, which means that overall, this function ends up being quadratic. This means that if we double the size of n, it will take four times as long to solve. In addition, because it is still recursive in nature, the stack depth is a limiting factor in which numbers we can calculate.

```import java.math.*;

public class FibonacciRecursive extends Fibonacci {
public BigInteger doActualCalculate(int n)
throws InterruptedException {
if (n < 0) throw new IllegalArgumentException();
if (n == 0) return BigInteger.ZERO;
if (n == 1) return BigInteger.ONE;
return calculate(n - 1).add(calculate(n - 2));
}
}
```

At this point, we should probably show the special FibonacciCache class. It contains the optimization that if we request "n" and the cache does not contain that, we see whether "n-1" and "n-2" are contained. If they are, then we can use those to quickly calculate the Fibonacci value for "n". Similarly, we can look for "n+1" and "n+2" or "n+1" and "n-1" to find "n" quickly. It also tries to avoid calculating the same value twice through an elaborate reserved caching scheme (see cacheReservation and solutionArrived fields). Unfortunately this solution does not increase the number of worker threads in the ForkJoinPool if one of the threads is blocked waiting for another thread to finish its work on a number.

One of the things that kept on happening was that several threads would request the same number at the same time. Since neither of them found it, they would both go and work out the value. This wasted a lot of computing power. Thus, before we return "null" from this cache, we reserve the right to calculate the number by adding the value of "n" into the cacheReservation set. If another thread now tries to calculate the same fibonacci number, it is suspended until the first thread finds the answer. The speedup with this optimization was quite dramatic, but required that the number or threads in the ForkJoinPool exceeded the number of cores.

```import java.math.*;
import java.util.*;
import java.util.concurrent.*;
import java.util.concurrent.locks.*;

class FibonacciCache {
private final ConcurrentMap<Integer, BigInteger> cache =
new ConcurrentHashMap<>();

private final Lock lock = new ReentrantLock();
private final Condition solutionArrived = lock.newCondition();
private final Set<Integer> cacheReservation = new HashSet<>();

public BigInteger get(int n) throws InterruptedException {
lock.lock();
try {
while (cacheReservation.contains(n)) {
// we now want to wait until the answer is in the cache
solutionArrived.await();
}
BigInteger result = cache.get(n);
if (result != null) {
return result;
}

BigInteger nMinusOne = cache.get(n - 1);
BigInteger nMinusTwo = cache.get(n - 2);
if (nMinusOne != null && nMinusTwo != null) {
put(n, result);
return result;
}
BigInteger nPlusOne = cache.get(n + 1);
BigInteger nPlusTwo = cache.get(n + 2);
if (nPlusOne != null && nPlusTwo != null) {
result = nPlusTwo.subtract(nPlusOne);
put(n, result);
return result;
}
if (nPlusOne != null && nMinusOne != null) {
result = nPlusOne.subtract(nMinusOne);
put(n, result);
return result;
}
return null;
} finally {
lock.unlock();
}
}

public void put(int n, BigInteger value) {
lock.lock();
try {
solutionArrived.signalAll();
cacheReservation.remove(n);
cache.putIfAbsent(n, value);
} finally {
lock.unlock();
}
}
}
```

Some of the algorithms do not need caching. For example, if we solve the problem with iteration, then we will never try to find the same Fibonacci number more than once. Our NonCachingFibonacci abstract base class redefines the calculate() method as abstract and stops us from overriding doActualCalculate():

```import java.math.*;

public abstract class NonCachingFibonacci extends Fibonacci {
protected NonCachingFibonacci() {
super(null);
}

public final BigInteger doActualCalculate(int n)
throws InterruptedException {
throw new UnsupportedOperationException();
}

public abstract BigInteger calculate(int n)
throws InterruptedException;
}
```

The first example that uses the NonCachingFibonacci is the iterative solution. If we ran this with "long" values, it would be very fast. However, we are slowed down by the BigInteger method being linear, which again makes this function quadratic, similar to the FibonacciRecursive class shown earlier. Since it is not recursive, this function does not run out of stack space as would happen with FibonacciRecursive.

```import java.math.*;

public class FibonacciIterative extends NonCachingFibonacci {
public BigInteger calculate(int n)
throws InterruptedException {
if (n < 0) throw new IllegalArgumentException();
BigInteger n0 = BigInteger.ZERO;
BigInteger n1 = BigInteger.ONE;
for (int i = 0; i < n; i++) {
BigInteger temp = n1;
n0 = temp;
}
return n0;
}
}
```

The classical Fibonacci algorithm as described in the RecursiveTask JavaDocs is exponential. The only way of making it a bit performant is to cache the previous values. In this version we do not cache, making it incredibly slow. We can thus only calculate for very small values of n.

```import java.math.*;

public class FibonacciRecursiveNonCaching
extends NonCachingFibonacci {
public BigInteger calculate(int n)
throws InterruptedException {
if (n < 0) throw new IllegalArgumentException();
if (n == 0) return BigInteger.ZERO;
if (n == 1) return BigInteger.ONE;
return calculate(n - 1).add(calculate(n - 2));
}
}
```

We have shown simple iterative and recursive solutions. However, there is also a formula that we can use to calculate a specific Fibonacci number, without having to go back and calculate the previous numbers. This formula depends on the square root of 5. With "long" and "double", we can work out an accurate value of Fibonacci up to n=71.

Let `phi = (1 + root5) / 2` and `psi = (1 - root5) / 2`. `Fibonacci(n) = (phi^n - psi^n) / (phi - psi)`, where "^" means "to the power of". Or we could simply say `Fibonacci(n) = (phi^n - psi^n) / root5`.

```import java.math.*;

public class FibonacciFormulaLong extends NonCachingFibonacci {
private static final double root5 = Math.sqrt(5);
private static final double PHI = (1 + root5) / 2;
private static final double PSI = (1 - root5) / 2;
private static final int MAXIMUM_PRECISE_NUMBER = 71;

public BigInteger calculate(int n)
throws InterruptedException {
if (n < 0) throw new IllegalArgumentException();
if (n > MAXIMUM_PRECISE_NUMBER)
throw new IllegalArgumentException(
"Precision loss after " + MAXIMUM_PRECISE_NUMBER);
return new BigInteger(Long.toString(fibWithFormula(n)));
}

private static long fibWithFormula(int n) {
return (long)((Math.pow(PHI, n) - Math.pow(PSI, n)) / root5);
}
}
```

If we want to be able to work out Fibonacci(1000) with this formula, we are forced to use BigDecimal. Unfortunately it is a rather slow class. I used a value of root5 that would give me an accurate Fibonacci number up to n=1000. The more accurate we make root5, the slower this algorithm takes. The biggest time sink is the call to divide(root5) at the end of the calculation.

```import java.math.*;

public class FibonacciFormulaBigInteger extends NonCachingFibonacci {
private static final BigDecimal root5 = new BigDecimal(
"2.23606797749978969640917366873127623544061835961152572" +
"4270897245410520925637804899414414408378782274969508176" +
"1507737835042532677244470738635863601215334527088667781" +
"7319187916581127664532263985658053576135041753378");
private static final BigDecimal PHI = root5.add(
new BigDecimal(1)).divide(new BigDecimal(2));
private static final BigDecimal PSI = root5.subtract(
new BigDecimal(1)).divide(new BigDecimal(2));
private static final int MAXIMUM_PRECISE_NUMBER = 1000;

public BigInteger calculate(int n) throws InterruptedException {
if (n < 0) throw new IllegalArgumentException();
if (n > MAXIMUM_PRECISE_NUMBER)
throw new IllegalArgumentException(
"Precision loss after " + MAXIMUM_PRECISE_NUMBER);

BigDecimal phiToTheN = PHI.pow(n);
BigDecimal psiToTheN = PSI.pow(n);
BigDecimal phiMinusPsi = phiToTheN.subtract(psiToTheN);
BigDecimal result = phiMinusPsi.divide(
root5, 0, RoundingMode.UP);
return result.toBigInteger();
}
}
```

So far, the best we have achieved is a linear algorithm, but with the addition of linear BigInteger.add(), we ended up with quadratic performance. Can we do better?

One of the algorithms that is popular is Dijkstra's sum of the squares is also mentioned on R.Knott's website.

My implementation goes like this: if n is odd, then `f(2n-1) = f(n-1)^2 + f(n)^2`; otherwise `f(2n) = (2 * f(n-1) + f(n)) * f(n)`.

```import java.math.*;

public class FibonacciRecursiveDijkstra extends Fibonacci {
public BigInteger doActualCalculate(int n)
throws InterruptedException {
if (n == 0) return BigInteger.ZERO;
if (n == 1) return BigInteger.ONE;
if (n % 2 == 1) {
// f(2n-1) = f(n-1)^2 + f(n)^2
int left = (n + 1) / 2;
int right = (n + 1) / 2 - 1;
} else {
// f(2n) = (2 * f(n-1) + f(n)) * f(n)
int n_ = n / 2;
BigInteger fn = calculate(n_);
BigInteger fn_1 = calculate(n_ - 1);
}
}

protected BigInteger multiply(BigInteger bi0, BigInteger bi1) {
return bi0.multiply(bi1);
}

protected BigInteger square(BigInteger num) {
return multiply(num, num);
}
}
```

The algorithm now is logarithmic, meaning that we can solve Fibonacci(1000) with about 10 calculations. I expected this to be really fast and to outperform all the other algorithms. After all, the iterative and simple recursive solutions both are quadratic. However, it turns out that the bottleneck in the above code is the BigInteger.multiply() method, which is quadratic. So we now have O(n^2 * log n) performance. Good enough to find Fibonacci for a thousand but not for a billion.

After some research, I stumbled across the algorithm by Karatsuba. Instead of quadratic performance, it can achieve 3n^1.585 performance. There are other algorithms one can use as your numbers get larger, but I tried to build this solver without using third-party libraries. Examples of third-party libraries are JScience's LargeInteger, which internally uses Karatsuba and also a parallel version. However, I really wanted a solution that could cooperate with an existing ForkJoinPool rather than start new threads. Another option would be to simply use a Java wrapper to GMP, which should give us the same speed as a C wrapper to GMP, since all we are doing is delegate the Fibonacci function to GMP.

For large numbers, Karatsuba's multiplication algorithm, is a lot faster than ordinary BigInteger multiply. We give two implementations, the BasicKaratsuba for single-threaded calculations and the ParallelKaratsuba that utilizes the Fork/Join Framework. Another place that mentions Karatsuba is in Sedgewick and Wayne's book Introduction to Programming in Java: An Interdisciplinary Approach.

```import java.math.*;

public interface Karatsuba {
BigInteger multiply(BigInteger x, BigInteger y);
}
```

Our Karatsuba calculations use some utility functions for adding and splitting BigIntegers.

```import java.math.*;

class BigIntegerUtils {
public static BigInteger add(BigInteger... ints) {
BigInteger sum = ints;
for (int i = 1; i < ints.length; i++) {
}
return sum;
}

public static BigInteger[] split(BigInteger x, int m) {
BigInteger left = x.shiftRight(m);
BigInteger right = x.subtract(left.shiftLeft(m));
return new BigInteger[]{left, right};
}
}
```

The BasicKaratsuba contains a threshold value. If either of the numbers being multiplied is less than the threshold, then we use the standard BigInteger.multiply().

```import java.math.*;
import static eu.javaspecialists.tjsn.math.numbers.BigIntegerUtils.*;

public class BasicKaratsuba implements Karatsuba {
public static final String THRESHOLD_PROPERTY_NAME =
"eu.javaspecialists.tjsn.math.numbers.BasicKaratsubaThreshold";
private static final int THRESHOLD = Integer.getInteger(
THRESHOLD_PROPERTY_NAME, 1000);

public BigInteger multiply(BigInteger x, BigInteger y) {
int m = java.lang.Math.min(x.bitLength(), y.bitLength())/2;
if (m <= THRESHOLD)
return x.multiply(y);

// x = x1 * 2^m + x0
// y = y1 * 2^m + y0
BigInteger[] xs = BigIntegerUtils.split(x, m);
BigInteger[] ys = BigIntegerUtils.split(y, m);

// xy = (x1*2^m + x0)(y1*2^m + y0) = z2*2^2m + z1*2^m + z0
// where:
// z2 = x1 * y1
// z0 = x0 * y0
// z1 = x1 * y0 + x0 * y1 = (x1 + x0)(y1 + y0) - z2 - z0
BigInteger z2 = multiply(xs, ys);
BigInteger z0 = multiply(xs, ys);
subtract(z2).subtract(z0);

// result = z2 * 2^2m + z1 * 2^m + z0
}
}
```

The ParallelKaratsuba is almost identical to the BasicKaratsuba, except that it distributes the individual tasks to the ForkJoinPool. We can thus utilize the available hardware to multiply the numbers faster. Note how similar it looks to the sequential version in BasicKaratsuba. As long as your algorithm is recursive, it is very easy to employ the Fork/Join framework to speed up your work.

```import java.math.*;
import java.util.concurrent.*;

import static eu.javaspecialists.tjsn.math.numbers.BigIntegerUtils.*;

public class ParallelKaratsuba implements Karatsuba {
public static final String THRESHOLD_PROPERTY_NAME =
"eu.javaspecialists.tjsn.math.numbers.ParallelKaratsubaThreshold";
private static final int THRESHOLD = Integer.getInteger(
THRESHOLD_PROPERTY_NAME, 1000);
private final ForkJoinPool pool;

public ParallelKaratsuba(ForkJoinPool pool) {
this.pool = pool;
}

public BigInteger multiply(BigInteger x, BigInteger y) {
}

private final BigInteger x, y;

public KaratsubaTask(BigInteger x, BigInteger y) {
this.x = x;
this.y = y;
}

protected BigInteger compute() {
int m = (Math.min(x.bitLength(), y.bitLength()) / 2);
if (m <= THRESHOLD) {
return x.multiply(y);
}

BigInteger[] xs = split(x, m);
BigInteger[] ys = split(y, m);

BigInteger z0, z2;

}
}
}
```

Karatsuba makes a big difference in the performance, especially if we are able to utilize the available cores. We can do the same thing with Dijkstra's Fibonacci algorithm where we distribute the various Fibonacci calculations to the various cores. However, the biggest speedup is probably from the Karatsuba multiply, since that is where most of the time is spent. Our parallel solution uses its own internal FibonaccciCache.

```import java.math.*;
import java.util.concurrent.*;

public class FibonacciRecursiveParallelDijkstraKaratsuba
extends NonCachingFibonacci {
public static final String SEQUENTIAL_THRESHOLD_PROPERTY_NAME =
"eu.javaspecialists.tjsn.math.fibonacci.SequentialThreshold";
private final static int SEQUENTIAL_THRESHOLD =
Integer.getInteger(SEQUENTIAL_THRESHOLD_PROPERTY_NAME, 10000);
private final FibonacciCache cache = new FibonacciCache();
private final Fibonacci sequential =
new FibonacciRecursiveDijkstraKaratsuba();
private final ForkJoinPool pool;
private final Karatsuba karatsuba;

public FibonacciRecursiveParallelDijkstraKaratsuba(
ForkJoinPool pool) {
this.pool = pool;
karatsuba = new ParallelKaratsuba(pool);
}

public BigInteger calculate(int n) throws InterruptedException {
if (result == null) throw new InterruptedException();
return result;
}

private final int n;

this.n = n;
}

protected BigInteger compute() {
try {
BigInteger result = cache.get(n);
if (result != null) {
return result;
}

if (n < SEQUENTIAL_THRESHOLD) {
result = sequential.calculate(n);
} else {
if (n % 2 == 1) {
// f(2n-1) = f(n-1)^2 + f(n)^2
int left = (n + 1) / 2;
int right = (n + 1) / 2 - 1;
f1.fork();
BigInteger bi0 = f0.invoke();
BigInteger bi1 = f1.join();
if (isCancelled()) return null;
} else {
// f(2n) = (2 * f(n-1) + f(n)) * f(n)
int n_ = n / 2;
f1.fork();
BigInteger bi0 = f0.invoke();
BigInteger bi1 = f1.join();
if (isCancelled()) return null;
result = karatsuba.multiply(
}
}
cache.put(n, result);
return result;
} catch (InterruptedException e) {
cancel(true);
return null;
}
}

private BigInteger square(BigInteger num) {
return karatsuba.multiply(num, num);
}
}
}
```

The FibonacciGenerator uses the given Fibonacci function to work out the result as a BigDecimal. However, the conversion of BigDecimal to a String is again, you guessed it, quadratic. Instead, what we do is print out the first and last 10 bytes and an Adler32 checksum.

```import java.math.*;
import java.util.zip.*;

public class FibonacciGenerator {
private final Fibonacci fib;

public FibonacciGenerator(Fibonacci fib) {
this.fib = fib;
}

public void findFib(int n) throws InterruptedException {
System.out.printf("Searching for Fibonacci(%,d)%n", n);
long time = System.currentTimeMillis();
BigInteger num = fib.calculate(n);
time = System.currentTimeMillis() - time;
printProof(num);
System.out.printf("  Time to calculate %d ms%n%n", time);
}

private void printProof(BigInteger num) {
System.out.printf("  Number of bits: %d%n", num.bitLength());
byte[] numHex = num.toByteArray();
System.out.print("  First 10 bytes: ");
for (int i = 0; i < 10; i++) {
System.out.printf(" %02x", numHex[i]);
}
System.out.println();

System.out.print("  Last 10 bytes:  ");
for (int i = numHex.length - 10; i < numHex.length; i++) {
System.out.printf(" %02x", numHex[i]);
}
System.out.println();

ck.update(numHex, 0, numHex.length);
ck.getValue());
}
}
```

The FibonacciGeneratorExample finds Fibonacci of one billion in about 34 minutes on my 8-core server.

```import java.util.concurrent.*;

public class FibonacciGeneratorExample {
private static ForkJoinPool pool = new ForkJoinPool(
Runtime.getRuntime().availableProcessors() * 4);

public static void main(String[] args)
throws InterruptedException {
int[] ns;
if (args.length != 0) {
ns = new int[args.length];
for (int i = 0; i < ns.length; i++) {
ns[i] = Integer.parseInt(args[i]);
}
} else {
ns = new int[]{
1_000_000,
10_000_000,
100_000_000, // takes a bit long
1_000_000_000, // takes a bit long
};
}
test(new FibonacciRecursiveParallelDijkstraKaratsuba(pool), ns);
}

private static void test(Fibonacci fib, int... ns)
throws InterruptedException {
for (int n : ns) {
FibonacciGenerator fibgen = new FibonacciGenerator(fib);
fibgen.findFib(n);
System.out.println(pool);
}
}
}
```

I would love to hear if you can come up with an algorithm in Java to solve Fibonacci one billion faster on my hardware than my 34.58 minutes. Fastest solution gets a whopping 34.58% discount on the Java Symposium entrance fees! The rules are: you need to write the code yourself, but you can use existing algorithms.

Kind regards

Heinz

P.S. Did you know that the Fibonacci series occurs in nature? For example, the shells of a turtle are arranged according to the series. That's something to think about as you watch this big fellow swimming around the Gulf of Chania.

We are always happy to receive comments from our readers. Feel free to send me a comment via email or discuss the newsletter in our JavaSpecialists Slack Channel (Get an invite here)

When you load these comments, you'll be connected to Disqus. Privacy Statement.

### Related Articles Java Champion, author of the Javaspecialists Newsletter, conference speaking regular... About Heinz

#### Superpack 2020 Our entire Java Specialists Training in one huge bundle more...
##### Java Training

We deliver relevant courses, by top Java developers to produce more resourceful and efficient programmers within their organisations.

##### Java Consulting

We can help make your Java application run faster and trouble-shoot concurrency and performance bugs...

##### Java Emergency?

If your system is down, we will review it for 15 minutes and give you our findings for just 1 € without any obligation.