|
The Java Specialists' Newsletter
Issue 028 2001-08-14
Category:
Language
Java version: Multicasting in Javaby Paul van Spronsen
Multicasting in Java
1. Introduction
This article deals primarily with the subject of multicast
communication in Java. I have, however, included some background
information to refresh the memory of those who have forgotten how
much they know about data communications. If the concepts
"datagram", "IP fragment", "reliable protocol" or "multicast" are
not clear to you, try referring to the appendices. If the
appendices appear shrouded in mystery, go back to your data comms
lecturer and demand a refund.
2. Sending multicast datagrams
In order to send any kind of datagram in Java, be it unicast,
broadcast or multicast, one needs a
java.net.DatagramSocket:
DatagramSocket socket = new DatagramSocket();
One can optionally supply a local port to the DatagramSocket
constructor to which the socket must bind. This is only
necessary if one needs other parties to be able to reach us at a
specific port. A third constructor takes the local port AND the
local IP address to which to bind. This is used (rarely) with
multi-homed hosts where it is important on which network adapter
the traffic is received. Neither of these is necessary for this
example.
This sample code creates the socket and a datagram to send and
then simply sends the same datagram every second:
DatagramSocket socket = new DatagramSocket();
byte[] b = new byte[DGRAM_LENGTH];
DatagramPacket dgram;
dgram = new DatagramPacket(b, b.length,
InetAddress.getByName(MCAST_ADDR), DEST_PORT);
System.err.println("Sending " + b.length + " bytes to " +
dgram.getAddress() + ':' + dgram.getPort());
while(true) {
System.err.print(".");
socket.send(dgram);
Thread.sleep(1000);
}
Valid values for the constants are:
DGRAM_LENGTH: anything from 0 to 65507 (see section 5), eg 32
MCAST_ADDR: any class D address (see appendix D), eg 235.1.1.1
DEST_PORT: an unsigned 16-bit integer, eg. 7777
It is important to note the following points:
- DatagramPacket does not make a copy of the byte-array given to
it, so any change to the byte-array before the
socket.send() will
reflect in the data actually sent;
- One can send the same DatagramPacket to several different
destinations by changing the address and or port using the
setAddress() and setPort() methods;
- One can send different data to the same destination by
changing the byte array referred to using
setData() and
setLength() or by changing the contents of the byte array the
DatagramPacket is referring to;
- One can send a subset of the data in the byte array by
manipulating offset and length through the
setOffset() and
setLength() methods.
3. Receiving multicast datagrams
One can use a normal DatagramSocket to send and receive unicast
and broadcast datagrams and to send multicast datagrams as seen
in the section 2. In order to receive multicast datagrams,
however, one needs a MulticastSocket. The reason for this is
simple, additional work needs to be done to control and receive
multicast traffic by all the protocol layers below UDP.
The example given below, opens a multicast socket, binds it to a
specific port and joins a specific multicast group:
byte[] b = new byte[BUFFER_LENGTH];
DatagramPacket dgram = new DatagramPacket(b, b.length);
MulticastSocket socket =
new MulticastSocket(DEST_PORT); // must bind receive side
socket.joinGroup(InetAddress.getByName(MCAST_ADDR));
while(true) {
socket.receive(dgram); // blocks until a datagram is received
System.err.println("Received " + dgram.getLength() +
" bytes from " + dgram.getAddress());
dgram.setLength(b.length); // must reset length field!
}
Values for DEST_PORT and MCAST_ADDR must match those in the
sending code for the listener to receive the datagrams sent
there. BUFFER_LENGTH should be at least as long as the data we
intend to receive. If BUFFER_LENGTH is shorter, the data will be
truncated silently and dgram.getLength() will
return b.length.
The MulticastSocket.joinGroup() method causes the lower protocol
layers to be informed that we are interested in multicast traffic
to a particular group address. One may execute joinGroup() many
times to subscribe to different groups. If multiple
MulticastSockets bind to the same port and join the same
multicast group, they will all receive copies of multicast
traffic sent to that group/port.
As with the sending side, one can re-use ones DatagramPacket and
byte-array instances. The receive() method sets length to the
amount of data received, so remember to reset the length field in
the DatagramPacket before subsequent receives, otherwise you will
be silently truncating all your incoming data to the length of
the shortest datagram previously received.
One can set a timeout on the receive() operation using
socket.setSoTimeout(timeoutInMilliseconds). If the timeout is
reached before a datagram is received, the receive() throws a
java.io.InterruptedIOException. The socket is still valid and
usable for sending and receiving if this happens.
4. Multicasting and serialization
We have seen in the previous sections that we can multicast
anything we can fit into a byte array. Conveniently for us, one
of those things is a serialized object.
Object serialization is based on the assumption of a stream
(ObjectOutputStream, ObjectInputStream),
so we have to do a little massaging to squeeze this into our datagram paradigm.
ObjectOutputStream writes a stream header (containing a magic
number and version number) to the stream on construction and
ObjectInputStream reads and checks this on construction (ever
wondered why ObjectInputStream's constructor blocks until the
ObjectOutputStream has been constructed on the sending side?).
This is the reason one always attaches the ObjectOutputStream to
the outgoing side of a socket before attaching the
ObjectInputStream to the incoming side.
In order to multicast objects, we need to arrange that the stream
header information is in each datagram. The simplest way to
ensure this is to create a new ObjectOutputStream for each
datagram we send and a new ObjectInputStream for each one we
receive. We could probably avoid these instantiations by
extending the two classes in question, but I'm not going into
that here.
On the sending side, we can do something like this:
ByteArrayOutputStream b_out = new ByteArrayOutputStream();
ObjectOutputStream o_out = new ObjectOutputStream(b_out);
o_out.writeObject(new Message());
byte[] b = b_out.toByteArray();
DatagramPacket dgram = new DatagramPacket(b, b.length,
InetAddress.getByName(MCAST_ADDR), DEST_PORT); // multicast
socket.send(dgram);
In addition, on the receiving side we can do something like this:
byte[] b = new byte[65535];
ByteArrayInputStream b_in = new ByteArrayInputStream(b);
DatagramPacket dgram = new DatagramPacket(b, b.length);
socket.receive(dgram); // blocks
ObjectInputStream o_in = new ObjectInputStream(b_in);
Object o = o_in.readObject();
dgram.setLength(b.length); // must reset length field!
b_in.reset(); // reset so next read is from start of byte[] again
Note that one can re-use the ByteArray*Streams, byte arrays and
DatagramPackets on both sides. Only the Object*Streams need be
recreated.
5. Datagram sizes
The IP spec allows for datagrams up to 65535 bytes in length,
including the IP header. If the underlying protocol layers
cannot support this size (Ethernet's MTU is 1500 bytes), IP
fragments the datagrams into several smaller datagrams. On the
receive side, IP reassembles the datagram before delivering it to
higher layer protocols, like UDP. If any of the fragments do not
arrive at the destination, the entire datagram is discarded, i.e.
there is no partial delivery of IP and therefore UDP datagrams.
Since the normal IP header is 20 bytes long and the UDP header is
always 8 bytes long, one would expect the maximum UDP data length
to be 65535-8-20 = 65507. Somehow, however, the combination of
Win2k and JDK1.3.1 manages to successfully send as much as 65527
bytes per datagram. I would be interested to hear whether users
of a real operating system experienced the same.
It is very important to note that although the IP spec allows
for datagrams up to 65535 bytes, it only requires implementations
to support up to 576 byte IP datagrams including IP and higher
protocol headers. Since the maximum IP header length is 64 and
the UDP header length is 8, it is safe to send up to 504 byte UDP
datagrams and expect the receiving side to handle it (yes, even
your Palm Pilot if it has a TCP/IP stack). I have not come
across a full size (i.e. non-embedded) system that cannot handle
the full 64k-1, though.
6. Effect of fault conditions
UDP does not gaurantee delivery or notification of non-delivery.
If you send a unicast packet to a host that does not exist, is
down or is not listening on that port, you will not know about
it. If you send a broadcast or multicast packet and nobody
receives it or is even listening, you will not know about it.
On Win2k the network adapter settings are reset if it is detected
that the link is not available. With Ethernet, for example, if
you unplug the LAN cable so that there is no link available,
Win2K detects this and effectively shuts down the adapter at the
IP level. It clears its IP address and will not attempt to use
it. The effect of this is that sockets cannot bind to a port, so
all new *Socket calls fail. Sockets that are already created
function correctly if you unplug and replug the cable.
On my notebook, local communication (sender and listener on the
same machine) began to fail when I unplugged the LAN cable. It
gets nastier than this:- a listener started before I unplugged
the cable could not hear traffic from a sender started after I
had plugged the cable back in. But wait, there's more! I
started another listener after the cable was back in and it
and the listeners started before I unplugged the cable, all receive
the multicasts again.
On WinNT4, my experience has been that the adapter is not
"shutdown" when the cable is unplugged and one does not have
these weird effects.
7. Multiple listeners and unicast packets
Since one can send unicast packets using the same MulticastSocket
instance as for ones multicasts, it makes sense to mention how
unicasts are handled when there is more than one listener, which
can only be when they are all on the same machine.
Unicast traffic sent to the port will be received by only one of
the listeners with a socket bound to the port. With my test
setup, the last socket to bind to the port receives the unicast
traffic. On WinNT4, the first one to bind receives it. I don't
know of any rules covering how unicast traffic should be handled
in the case of multiple listeners, so don't rely on it being
handled in any particular way.
8. Further reading
See the RFCs for IP(791), UDP(768) and IP multicasting(1112).
Compared to some of the ISO and IEEE stuff I've seen, they're
recreational reading material.
Appendix A. Protocol "reliability"
You may have heard TCP described as a "reliable" protocol and UDP
as an "unreliable" protocol. It is easy, but dangerous, to jump
to conclusions about what this means. Being "reliable" does not
mean that TCP will deliver your data under all circumstances (try
unplugging the LAN cable for a day and see). Being "unreliable",
does not mean UDP will arbitrarily throw away your data.
"Unreliable" is a loaded term and I prefer to use "non-reliable"
which indicates more that it lacks the gaurantees of a "reliable"
protocol, rather than labelling it as some sort of untrustworthy
servant.
Enough about what reliability, or lack of it, does not mean. A
"reliable" protocol like TCP guarantees that it will deliver your
data correctly and in order of transmission
or inform you that it could not.
A "non-reliable" protocol, like UDP, does what is called
"best-effort delivery". Essentially, given enough available
resources (buffers, bandwidth etc) UDP will deliver your data
correctly. It will not deliver incorrect data, but it could
deliver data in a different order to which it was sent or not at
all.
The NFS (Network File System) protocol uses UDP to communicate
between the server and the client. IMHO, this is a testament to
the "reliability" of UDP as a transport. Of course, NFS
implements its own reliability mechanisms (timeouts and
retransmissions) on top of UDP to be sure.
Appendix B. Stream vs Datagrams
The differences between TCP and UDP don't end with reliability.
They are fundamentally different in their data model. TCP is
stream based and UDP is datagram based. This means that with
UDP, if data is lost or delivered out of order, it happens with
datagram granularity.
Since TCP is stream based, it does not honour your message
boundaries. If you implement your own message passing system
using TCP, you will find that doing a send() call of n bytes on
one side of the connection does not necessarily result in n bytes
being returned by the "corresponding" read() call on the other
side. TCP rides on top of IP, which is datagram based, so there
is packetizing happening when TCP data is sent, but TCP is at
liberty to split your send() up into several actual packets or to
coalesce several send() operations into one packet.
Appendix C. nCasting
In the case of TCP, the number of intended recipients of
transmitted data is always exactly one (like a telephone call).
In general, this is not the case. Everybody is aware of
broadcast communication (like radio or television) where there is
one sender and any number of recipients. As most people know the
same exists in data communications.
Broadcast communication is frowned upon by network admins because
they spend a huge portion of their budget trying to provide
bandwidth using network switches, only to have this all defeated
by broadcast traffic being delivered to every segment of their
LANs. Broadcast communication also causes an interrupt and the
associated processing on every node on the connected LAN,
always. Ones Ethernet hardware, for example, cannot determine
whether the host is interested in any particular broadcast packet
and must therefore deliver the packet to the upper protocol
layers to make the decision. This is the reason Doom 1.1 network
games were banned on many LANs. The number of broadcasts used
caused such high interrupt processing loads on all the hosts on
networks where it was played. Thankfully, Doom 1.2 came along to
avert boredom during my time at university.
Where broadcasting is a mechanism intended to deliver data to all
hosts on a network or subnetwork, multicasting is a mechanism to
deliver data to a group of interested hosts on a network. Many
network adapters provide some sort of rudimentary multicast
filtering. In many cases, a host not interested in a particular
multicast group will not even be interrupted by its network
hardware.
In the TCP/IP protocol family, UDP is used for broadcast and
multicast (and some unicast) traffic. As a result, broadcast and
multicast traffic is datagram based and non-reliable.
Reliability, datagram vs stream based and unicast vs
multicast/broadcast traffic are all orthogonal concepts. It is
not inconceivable to have a reliable, stream based multicast
protocol, or any other combination of those features.
Appendix D. IP Multicast addresses
All class D IP addresses are multicast addresses. Class D IP
addresses are those that begin with 1110, that is, all addresses
from 224.0.0.0 to 235.255.255.255. Some are pre-assigned for
specific applications, but most are available for forming ad hoc
multicast groups. There is a mapping between IP multicast
addresses and Ethernet addresses, described in
RFC1112:
"An IP host group address is mapped to an Ethernet multicast
address by placing the low-order 23-bits of the IP address into
the low-order 23 bits of the Ethernet multicast address
01-00-5E-00-00-00 (hex). Because there are 28 significant bits in
an IP host group address, more than one host group address may
map to the same Ethernet multicast address."
Language Articles
Related Java Course
Discuss at The Java Specialist Club
|