|
The Java Specialists' Newsletter
Issue 122 2006-03-08
Category:
Tips and Tricks
Java version: JDK 1.5 Copying Files from the Internetby Dr. Heinz M. KabutzAbstract:
Sometimes you need to download files using HTTP from a machine
that you cannot run a browser on. In this simple Java program
we show you how this is done. We include information of your
progress for those who are impatient, and look at how the
volatile keyword can be used.
Welcome to the 122nd edition of The Java(tm) Specialists' Newsletter. On Monday we had the
hottest day in Cape Town since they began keeping records in
1957. It was a sweltering 41 degrees Celsius at the airport,
and probably much hotter in the city centre. I took most of the
day off and spent it with a newsletter subscriber visiting me
from Amsterdam. We went to the second largest granite outcrop
in the world, which is quite close to where I live. If you ever
come to Cape Town, make it a priority to go see the Paarl Rocks.
We are busy streamlining our website to make it more
navigable. In addition, we are moving over to a dedicated
server, which should sort out all the downtime issues he have
had recently. Until that is complete, you might have the
misfortune of not getting through to our javaspecialists.eu. We will send
out another newsletter when we have moved.
Upcoming Java Specialist Master Courses:
- please click here to sign up.
As from May 2010, we are also offering this course on the island of Crete. We
only accept 6 students per class in Crete, due to the size of our conference
room. Please book early to avoid disappointment!
San Jose CA, Mar 16-19 2010, $3500 Ottawa, Canada, Mar 22-25 2010, $3500 Oslo, Norway, Apr 13-16 2010, Kr 24500 Montreal, Canada, Apr 20-23 2010, $3500 Toronto, Canada, May 17-20 2010, $3500 Chania, Crete, May 25-28, Jun 29-Jul 2 or Aug 24-27 2010, €2500
In-house courses if these dates or locations do not suit you - click here for more information. Copying Files from the Internet
Part of the job of installing our own dedicated server involves
downloading software from the internet onto our machine. I did
not want to punch a hole in my router to allow me to open up an
X session onto the server. Considering my slow internet
connection, I also did not want to first download the files onto
my machine, then upload onto the server.
A technique that I have used many times for downloading files
from the internet is to open up a URL, grap the bytes, and
add them to a local file. Here is a small program that does
this for you. You can specify any URL, and it will fetch the
file from the internet for you and show you the progress.
You can either specify the URL and the destination filename
or let the Sucker work that out for himself.
Some URLs can tell you how many bytes the content is, others
do not reveal that information. I use the Strategy Pattern to
differentiate between the two. We have a top level Strategy
class called Stats and two implementations,
BasicStats and ProgressStats.
The stats are displayed in a background thread. This means that
the Stats class has to ensure that changes to the fields are
visible to the background thread.
In my System.out.println(), I output a new Date() to show the
progress of the download. This is usually a bad practice. It
would be better to use the DateFormat to reduce the amount of
processing that needs to be done to display the date.
The last comment about this class is the size of the buffer. At
the moment it is set to 1MB. This is larger than necessary, so
actual length will often be much smaller.
import java.io.*;
import java.net.*;
import java.util.*;
public class Sucker {
private final String outputFile;
private final Stats stats;
private final URL url;
public Sucker(String path, String outputFile) throws IOException {
this.outputFile = outputFile;
System.out.println(new Date() + " Constructing Sucker");
url = new URL(path);
System.out.println(new Date() + " Connected to URL");
stats = Stats.make(url);
}
public Sucker(String path) throws IOException {
this(path, path.replaceAll(".*\\/", ""));
}
private void downloadFile() throws IOException {
Timer timer = new Timer();
timer.schedule(new TimerTask() {
public void run() {
stats.print();
}
}, 1000, 1000);
try {
System.out.println(new Date() + " Opening Streams");
InputStream in = url.openStream();
OutputStream out = new FileOutputStream(outputFile);
System.out.println(new Date() + " Streams opened");
byte[] buf = new byte[1024 * 1024];
int length;
while ((length = in.read(buf)) != -1) {
out.write(buf, 0, length);
stats.bytes(length);
}
in.close();
out.close();
} finally {
timer.cancel();
stats.print();
}
}
private static void usage() {
System.out.println("Usage: java Sucker URL [targetfile]");
System.out.println("\tThis will download the file at the URL " +
"to the targetfile location");
System.exit(1);
}
public static void main(String[] args) throws IOException {
Sucker sucker;
switch (args.length) {
case 1: sucker = new Sucker(args[0]); break;
case 2: sucker = new Sucker(args[0], args[1]); break;
default: usage(); return;
}
sucker.downloadFile();
}
}
The Stats class needs a little bit of explaining. The field
totalBytes is written to by one thread, and read
from by another. Since we are writing with only one thread, we
can get away with just making the field
volatile. We have to make it at least
volatile to ensure that the timer thread
can see our changes.
The printf() statement "%10dKB%5s%% (%d KB/s)%n"
looks beautiful, does it not? The %10d means a decimal number
with 10 places, right justified. The "KB" stands for kilobytes.
The %5s means a String with 5 spaces, right justified. Then we
have a %%, which represents the % sign. The newline is done
with %n. Cryptic I know, but for experienced C programmers this
should read like poetry :-)
The Stats class contains a factory method that returns a
different strategy, depending on whether the content length is
known. Having the factory method inside Stats allows us to
introduce new types of Stats without modifying the context
class, in this case Sucker.
import java.net.*;
import java.io.IOException;
import java.util.Date;
public abstract class Stats {
private volatile int totalBytes;
private long start = System.currentTimeMillis();
public int seconds() {
int result = (int) ((System.currentTimeMillis() - start) / 1000);
return result == 0 ? 1 : result; // avoid div by zero
}
public void bytes(int length) {
totalBytes += length;
}
public void print() {
int kbpersecond = (int) (totalBytes / seconds() / 1024);
System.out.printf("%10d KB%5s%% (%d KB/s)%n", totalBytes/1024,
calculatePercentageComplete(totalBytes), kbpersecond);
}
public abstract String calculatePercentageComplete(int bytes);
public static Stats make(URL url) throws IOException {
System.out.println(new Date() + " Opening connection to URL");
URLConnection con = url.openConnection();
System.out.println(new Date() + " Getting content length");
int size = con.getContentLength();
return size == -1 ? new BasicStats() : new ProgressStats(size);
}
}
The ProgressStats class is used when we know the
content length of the URL, otherwise BasicStats
is used.
public class ProgressStats extends Stats {
private final long contentLength;
public ProgressStats(long contentLength) {
this.contentLength = contentLength;
}
public String calculatePercentageComplete(int totalBytes) {
return Long.toString((totalBytes * 100L / contentLength));
}
}
public class BasicStats extends Stats {
public String calculatePercentageComplete(int totalBytes) {
return "???";
}
}
Let's run the Sucker class. To download a picture of me at the
Tsinghua University in China, you would do the following:
java Sucker http://www.javaspecialists.eu/pics/TsinghuaClass.jpg
which produces the following output on my slow connection to the
internet:
Wed Mar 08 12:24:27 GMT+02:00 2006 Constructing Sucker
Wed Mar 08 12:24:27 GMT+02:00 2006 Connected to URL
Wed Mar 08 12:24:27 GMT+02:00 2006 Opening connection to URL
Wed Mar 08 12:24:27 GMT+02:00 2006 Getting content length
Wed Mar 08 12:24:27 GMT+02:00 2006 Opening Streams
Wed Mar 08 12:24:28 GMT+02:00 2006 Streams opened
6 KB 2% (6 KB/s)
56 KB 17% (28 KB/s)
104 KB 32% (34 KB/s)
158 KB 49% (39 KB/s)
203 KB 63% (40 KB/s)
257 KB 79% (42 KB/s)
295 KB 91% (42 KB/s)
322 KB 100% (46 KB/s)
When I tried downloading the latest Tomcat version from my
server, the speed was far more acceptable:
Wed Mar 08 11:25:52 CET 2006 Constructing Sucker
Wed Mar 08 11:25:52 CET 2006 Connected to URL
Wed Mar 08 11:25:52 CET 2006 Opening connection to URL
Wed Mar 08 11:25:52 CET 2006 Getting content length
Wed Mar 08 11:25:57 CET 2006 Opening Streams
Wed Mar 08 11:25:58 CET 2006 Streams opened
1056 KB 18% (1056 KB/s)
2272 KB 38% (1136 KB/s)
3200 KB 54% (1066 KB/s)
4121 KB 70% (1030 KB/s)
5200 KB 89% (1040 KB/s)
5829 KB 100% (1165 KB/s)
There are ways of running this through a proxy as well, which
you apparently do like this (according to my friends Pat Cousins
and Leon Swanepoel):
System.getProperties().put("proxySet", "true");
System.getProperties().put("proxyHost", "193.41.31.2");
System.getProperties().put("proxyPort", "8080");
If you need to supply a password, you can do that by changing
the authenticator:
Authenticator.setDefault(new Authenticator() {
protected PasswordAuthentication getPasswordAuthentication() {
return new PasswordAuthentication(
"username", "password".toCharArray());
}
});
I have not tried this out myself, so use at own risk :)
That is all for this week. Thank you for your continued support
by reading this newsletter, and forwarding it to your friends :)
Kind regards
Heinz
Tips and Tricks Articles
Related Java Course
|