|
The Java Specialists' Newsletter
Issue 135 2006-11-06
Category:
Performance
Java version: JDK 5 Are you really Multi-Core?by Dr. Heinz M. KabutzAbstract:
With Java 5, we can measure CPU cycles per thread. Here is a
small program that runs several CPU intensive tasks in
separate threads and then compares the elapsed time to the
total CPU time of the threads. The factor should give you
some indication of the CPU based acceleration that the multi
cores are giving you.
Welcome to the 135th edition of The Java(tm) Specialists' Newsletter, now sent from a
beautiful little island in Greece. We arrived safely two weeks
ago and have been running around organising the basics, such as
purchasing a vehicle, opening a bank account, getting cell phone
contracts. Things happen really quickly in Greece. We can get
my wife's Greek birth certificate in one week. In South Africa,
this took me about 4 months to do. In about a week's time, I
should be ready to apply for permanent residence here in Greece,
so now I am the "First Java Champion in Greece" :))
The Java Performance Tuning course almost didn't happen, due to
the hotel being washed into the sea by the storms. Fortunately
my friend George Niavradakis (who sells real estate in
Crete) jumped in and organised a new venue for us.
And the dinner at Irene's was unforgetable, as always!
Upcoming Java Specialist Master Courses:
- please click here to sign up.
As from May 2010, we are also offering this course on the island of Crete. We
only accept 6 students per class in Crete, due to the size of our conference
room. Please book early to avoid disappointment!
San Jose CA, Mar 16-19 2010, $3500 Ottawa, Canada, Mar 22-25 2010, $3500 Oslo, Norway, Apr 13-16 2010, Kr 24500 Montreal, Canada, Apr 20-23 2010, $3500 Toronto, Canada, May 17-20 2010, $3500 Chania, Crete, May 25-28, Jun 29-Jul 2 or Aug 24-27 2010, €2500
In-house courses if these dates or locations do not suit you - click here for more information. Are you really Multi-Core?
A few weeks ago, I presented a Java 5 and a Design Patterns
Course in Cape Town to a bunch of developers. They were mostly
developing in Linux, and one of the chaps was impressing us all
with his multi-core machine. A Dell Latitude notebook, with
tons of RAM, a great graphics card, etc. It looked really fast,
especially the 3D effects of his desktop.
One of the exercises that we do in the Java 5 course is to
measure the CPU cycles that a thread has used, as opposed to
elapsed time. If you have one CPU in your machine, then these
should be roughly the same. However, when you have several CPUs
in your machine, the CPU cycles should be a factor more than the
elapsed time. The factor should never be more than the number
of actual CPUs, and may be less when you either have other
processes running, or too many threads per CPU. Also, as all
good computer scientists know, you can never scale completely
linearly on one machine, so as you approach a large number of
CPUs, the factor will grow more slowly.
Here is a short piece of code that starts 5 threads. Each
thread runs through a loop from 0 to 999999999. For each thread
we measure the thread CPU time with the new ThreadMXBean.
These are added up and then we divide the total by the elapsed
time (also called "wall clock time"). In order to not introduce
contention, I'm using the AtomicLong and the CountDownLatch.
import java.lang.management.*;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.atomic.AtomicLong;
public class MultiCoreTester {
private static final int THREADS = 5;
private static CountDownLatch ct = new CountDownLatch(THREADS);
private static AtomicLong total = new AtomicLong();
public static void main(String[] args)
throws InterruptedException {
long elapsedTime = System.nanoTime();
for (int i = 0; i < THREADS; i++) {
Thread thread = new Thread() {
public void run() {
total.addAndGet(measureThreadCpuTime());
ct.countDown();
}
};
thread.start();
}
ct.await();
elapsedTime = System.nanoTime() - elapsedTime;
System.out.println("Total elapsed time " + elapsedTime);
System.out.println("Total thread CPU time " + total.get());
double factor = total.get();
factor /= elapsedTime;
System.out.printf("Factor: %.2f%n", factor);
}
private static long measureThreadCpuTime() {
ThreadMXBean tm = ManagementFactory.getThreadMXBean();
long cpuTime = tm.getCurrentThreadCpuTime();
long total=0;
for (int i = 0; i < 1000 * 1000 * 1000; i++) {
// keep ourselves busy for a while ...
// note: we had to add some "work" into the loop or Java 6
// optimizes it away. Thanks to Daniel Einspanjer for
// pointing that out.
total += i;
total *= 10;
}
cpuTime = tm.getCurrentThreadCpuTime() - cpuTime;
System.out.println(total + " ... " + Thread.currentThread() +
": cpuTime = " + cpuTime);
return cpuTime;
}
}
When I run this on my little D800 Latitude, I get:
Thread[Thread-3,5,main]: cpuTime = 1920000000
Thread[Thread-2,5,main]: cpuTime = 1920000000
Thread[Thread-1,5,main]: cpuTime = 1930000000
Thread[Thread-4,5,main]: cpuTime = 1920000000
Thread[Thread-0,5,main]: cpuTime = 1940000000
Total elapsed time 9759677000
Total thread CPU time 9630000000
Factor: 0.99
As always with performance testing, we have to be careful to
run it on a quiet machine. If I copy a large file at the same
time while running the test, I get:
Thread[Thread-0,5,main]: cpuTime = 1920000000
Thread[Thread-4,5,main]: cpuTime = 1990000000
Thread[Thread-2,5,main]: cpuTime = 1960000000
Thread[Thread-1,5,main]: cpuTime = 1980000000
Thread[Thread-3,5,main]: cpuTime = 1960000000
Total elapsed time 10979895000
Total thread CPU time 9810000000
Factor: 0.89
When I run the program twice in parallel on a quiet system, the
Factor should be close to 0.5, hopefully:
Thread[Thread-3,5,main]: cpuTime = 4090000000
Thread[Thread-4,5,main]: cpuTime = 4070000000
Thread[Thread-0,5,main]: cpuTime = 2660000000
Thread[Thread-2,5,main]: cpuTime = 4020000000
Thread[Thread-1,5,main]: cpuTime = 2970000000
Total elapsed time 33988220000
Total thread CPU time 17810000000
Factor: 0.52
and the second run, started slightly later
Thread[Thread-1,5,main]: cpuTime = 3320000000
Thread[Thread-3,5,main]: cpuTime = 3120000000
Thread[Thread-4,5,main]: cpuTime = 3190000000
Thread[Thread-0,5,main]: cpuTime = 2590000000
Thread[Thread-2,5,main]: cpuTime = 3070000000
Total elapsed time 32353817000
Total thread CPU time 15290000000
Factor: 0.47
When we ran this program on the student's supa-dupa multi-core
system, we were puzzled in that the factor was just below 1.
We rebooted the machine into Windows, and the factor went up to
just below 2. Fortunately we had a system administrator in the
group, and he pointed out that the kernel on that Linux machine
was incorrect. By simply putting the correct kernel on, the
dream machine laptop was able to run at double the CPU cycles.
Your exercise for today is to find a multi-core or multi-cpu
machine and see what factor you get. You need at least a JDK 5.
Let me know how you fare ... :)
Just a hint: the number of threads should probably be a multiple
of the number of CPUs or cores that you have available.
Kind regards from Greece
Heinz
Performance Articles
Related Java Course
|