Java Specialists' Java Training Europehome of the java specialists' newsletter

The Java Specialists' Newsletter
Issue 0232001-06-21 Category: Performance Java version:

GitHub Subscribe Free RSS Feed

Socket Wheel to handle many clients

by Dr. Heinz M. Kabutz

Welcome to the 23rd issue of The Java(tm) Specialists' Newsletter, where I try and get back to my roots of distributed performance evaluation. My PhD thesis was entitled "Analytical Performance Evaluation of Concurrent Communicating Systems using SDL and Stochastic Petri Nets", or something like that. The main idea was to automatically map protocols designed in the Specification and Description Language (SDL) to a modelling language called Stochastic Petri Nets for the simple reason that there are a lot of well known analytical techniques available for evaluating a Stochastic Petri Net's performance. But, all that was of another era when it did not matter that it would take a very long time to analyse a protocol of any real size. In fact, that was last millenium, i.e. long long ago, i.e. don't bother asking me any questions about it ;-)

When we measure performance we have to consider mainly two criteria: memory and cpu cycles. I was able to significantly reduce the amount of memory needed for the server by the idea presented here, but I did not manage to increase the speed at which clients are serviced, although it converged. If you think of anything that would improve the speed of the SocketWheel, please let me know, and you will earn instant fame in over 40 countries by being immortalized in my next newsletter.

Please forward this newsletter to as many people as you know who are interested in programming in Java at more-than-entry-level.

NEW: Please see our new "Extreme Java" course, combining concurrency, a little bit of performance and Java 8. Extreme Java - Concurrency & Performance for Java 8.

Socket Wheel to handle many clients

The typical way of implementing a server that needs to "talk back" to the client is to construct a thread for each client that is connected, normally through a thread pool. For example, consider the Server.java file:

// Server.java
import java.net.*;
import java.io.*;
public class Server {
  public static final int PORT = 4444;
  public Server(int port) throws IOException {
    ServerSocket ss = new ServerSocket(port);
    while(true) {
      new ServerThread(ss.accept());
    }
  }
  private class ServerThread extends Thread {
    private final Socket socket;
    public ServerThread(Socket socket) {
      this.socket = socket;
      start();
    }
    public void run() {
      try {
        ObjectOutputStream out = new ObjectOutputStream(
          socket.getOutputStream());
        ObjectInputStream in = new ObjectInputStream(
          socket.getInputStream());
        while(true) {
          in.readObject();
          out.writeObject(new String("test"));
          out.flush();
          out.reset();
        }
      } catch(Throwable t) {
        System.out.println("Caught " + t + " - closing thread");
      }
    }
  }
  public static void main(String[] args) throws IOException {
    new Server(PORT);
  }
}

What this does is simply read an object and write an object for the duration of the client being connected to the thread. When the client disconnects, the thread will stop. The code is not very "clean", I should handle closing of Sockets better than it is, but I don't want to cloud over the issue at stake here.

A client would typically look like this (send an object, read an object, wait some time, etc.):

// Client.java
import java.net.*;
import java.io.*;
public class Client {
  public Client(int port) throws Exception {
    Socket socket = new Socket("localhost", port);
    ObjectOutputStream out = new ObjectOutputStream(
      socket.getOutputStream());
    ObjectInputStream in = new ObjectInputStream(
      socket.getInputStream());
    for (int i=0; i<10; i++) {
      out.writeObject(new Integer(i));
      out.flush();
      out.reset();
      System.out.println(in.readObject());
      Thread.sleep(1000);
    }
  }
  public static void main(String[] args) throws Exception {
    new Client(Server.PORT);
  }
}

In order to test what happens when a lot of clients connect I wrote a MultiClient class, which constructs 3500 sockets and an equivalent number of object output and input streams to use for sending messages. It then cycles through the sockets and writes to them one object, then cycles through them again and reads one object. The reason I took 3500 sockets is that on my little notebook I could not open more than 3500 sockets, don't ask why, I don't know. That number is *probably* system dependent, so if you get an exception when trying to create a socket, try what happens when you have less sockets.

// MultiClient.java
public class MultiClient {
  public MultiClient(int port) throws Exception {
    long time = -System.currentTimeMillis();
    Socket[] sockets = new Socket[3500];
    ObjectOutputStream[] outs =
      new ObjectOutputStream[sockets.length];
    ObjectInputStream[] ins =
      new ObjectInputStream[sockets.length];
    for (int i=0; i<sockets.length; i++) {
      sockets[i] = new Socket("localhost", port);
      outs[i] = new ObjectOutputStream(
        sockets[i].getOutputStream());
      ins[i] = new ObjectInputStream(
        sockets[i].getInputStream());
    }
    System.out.println("Constructed all sockets");
    for (int j=0; j<32; j++) {
      long iterationTime = -System.currentTimeMillis();
      for (int i=0; i<sockets.length; i++) {
        outs[i].writeObject(new Integer(i));
        outs[i].flush();
        outs[i].reset();
      }
      System.out.println(j + ": Written to all sockets");
      for (int i=0; i<sockets.length; i++) {
        ins[i].readObject();
      }
      System.out.println(j + ": Read from all sockets");
      iterationTime += System.currentTimeMillis();
      System.out.println(j + ": Iteration took " +
        iterationTime + "ms");
    }
    time += System.currentTimeMillis();
    System.out.println("Writing to " + sockets.length +
      " sockets 32 times took " + time + "ms");
  }
  public static void main(String[] args) throws Exception {
    new MultiClient(Server.PORT);
  }
}

This all works quite nicely, except that each thread in the JDK1.3 implementation of the VM takes up 20KB for its stack. When you add up all the other memory taken up for streams and sockets, it comes to 97MB used up on the server, just to handle a paltry 3500 clients! So, if we had 35000 clients connecting, with the presumption that our machine can handle that many sockets, we not only create 35000 threads (which each don't do that much, but it still takes a long time to construct them all, even if you use a thread pool) but we also gobble up almost 1GB of memory!!!

I was pondering this problem a few days ago and came up with an idea to use polling (yes, I know, polling sucks) to avoid making a thread for each client. Instead of having a server which uses a thread for each client that gets connected, we have a list of sockets that are connected to the server. Writing to a socket will block if the TCP buffer is full, which is about 64000 bytes, so we could have the problem of the server being hung up completely if the client decides to not service the stream. I will conveniently ignore that problem in this newsletter.

We want to have an ObjectOutputStream and an ObjectInputStream associated with each Socket, so we make an inner class to contain those values, which we call a SocketBucket. To make connecting fast, we keep two lists of SocketBuckets, one for the new sockets and one for the already connected sockets. We then run through all the sockets and try to read from each of them with a timeout of 1 millisecond. If there is nothing to read we get an InterruptedIOException and go to the next socket.

If we found at least one socket that had some data, we immediately go looking again, otherwise we go dream for a while and then go looking for more data. There are many different ways in which we could tune this approach, for example, you could keep a set of the last sockets which had data and push their priority up or down, depending on stochastic prediction techniques (not that I remember what that means - just sounded cool!). One of the disadvantages with this approach is also that the server has to wait for an entire millisecond before looking at the next socket. It would be much better to wait less, otherwise if you have 1000 sockets connected, it will take 1 second just to check if any of the sockets have data waiting. Unfortunately, 1 ms is the shortest that we can wait with Java sockets.

Here is the code for the SocketWheel:

// SocketWheel.java
import java.net.*;
import java.io.*;
import java.util.*;
public class SocketWheel {
  // the list contains SocketBuckets
  private final List sockets = new LinkedList();
  // we don't want to block a new connection while we are busy
  // serving the existing ones
  private final List newSockets = new LinkedList();
  public SocketWheel() {
    new ServerThread();
  }
  public void addSocket(Socket socket) throws IOException {
    synchronized(newSockets) {
      newSockets.add(new SocketBucket(socket));
      newSockets.notify();
    }
  }
  private class SocketBucket {
    public final Socket socket;
    public final ObjectOutputStream out;
    public final ObjectInputStream in;
    public SocketBucket(Socket socket) throws IOException {
      this.socket = socket;
      out = new ObjectOutputStream(socket.getOutputStream());
      in = new ObjectInputStream(socket.getInputStream());
      socket.setSoTimeout(1);  // VERY short timeout
    }
  }
  private class ServerThread extends Thread {
    public ServerThread() {
      super("ServerThread");
      start();
    }
    public void run() {
      long dreamTime = 10;
      boolean foundSomething;
      while(true) {
        try {
          synchronized(newSockets) {
            sockets.addAll(newSockets);
            newSockets.clear();
          }
          foundSomething = false;
          Iterator it = sockets.iterator();
          while(it.hasNext()) {
            SocketBucket bucket = (SocketBucket)it.next();
            try {
              bucket.in.readObject();
              foundSomething = true;
              bucket.out.writeObject(new String("test"));
              bucket.out.flush();
              bucket.out.reset();
            } catch(InterruptedIOException ex) {
              // just skip this socket
            } catch(IOException ex) {
              it.remove();
            }
          }
          if (foundSomething) {
            dreamTime = 6;
          } else {
            if (dreamTime < 1000)
              dreamTime *= 1.5;
            else dreamTime = 1000;
            synchronized(newSockets) {
              // only sleep if we didn't find anything
              newSockets.wait(dreamTime);
            }
          }
        } catch(Throwable t) {
          System.out.println("Caught " + t + " - remove socket");
        }
      }
    }
  }
  public static void main(String[] args) throws IOException {
    SocketWheel wheel = new SocketWheel();
    ServerSocket ss = new ServerSocket(Server.PORT);
    while(true) {
      Socket socket = ss.accept();
      wheel.addSocket(socket);
    }
  }
}

When I connect to the SocketWheel server with the MultiClient, the server uses up only 32MB of RAM, basically one third of the other server, but it is a little bit slower to use this approach as opposed to threading, and it is a lot more complicated. In addition, the whole example sometimes gets stuck, I don't know why. If the MultiClient stops proceeding and the CPU goes to 0%, you'll have to restart the MultiClient. (If you spot the problem, please let me know, I suspect it's an underlying C implementation problem which is why I'm not pursuing it.)

With the SocketWheel, the test took 2:36 minutes to complete, the normal threaded Server took only 2:23 minutes to complete, CPU was at 100% both times, disk usage was 0%. The difference in speed is not that great, whereas the memory usage is only 34MB in the SocketWheel server, i.e. roughly 1/3 of the threaded server.

When we change the MultiClient to only use 350 sockets, the SocketWheel takes 14 seconds, the threaded server only 11 seconds, the SocketWheel uses 9.4MB, the threaded server 14MB, a smaller difference, probably because that includes the total memory used by java.exe measured with the task manager, i.e. it includes the memory used by the JVM.

It was quite fun writing this SocketWheel, and was actually a lot easier than I thought possible. Please tell me if you've done something similar or if you think of ways to improve the speed of the SocketWheel server. I tried having a couple of threads in the SocketWheel, but the whole system just got stuck more often and did not improve speed. Please don't use the SocketWheel "as is" unless you're willing to discover and fix the bug that makes it get stuck and cater for clients not reading their sockets.

I always appreciate any feedback, both positive and negative, so please keep sending your ideas and suggestions. Please also remember to take the time to send this newsletter to others who are interested in Java.

Heinz

Performance Articles Related Java Course

Extreme Java - Concurrency and Performance for Java 8
Extreme Java - Advanced Topics for Java 8
Design Patterns
In-House Courses

© 2010-2016 Heinz Kabutz - All Rights Reserved Sitemap
Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. JavaSpecialists.eu is not connected to Oracle, Inc. and is not sponsored by Oracle, Inc.