|
The Java Specialists' Newsletter
Issue 071 2003-06-01
Category:
Language
Java version: Overloading considered Harmfulby Alexander (Sascha) Höher
Welcome to the 71st edition of The Java(tm) Specialists' Newsletter. We have a new country
on our subscription list; please welcome the Faroe Islands (applause)!
I had never heard of them, but then again, my knowledge of Geography
has never been good. There is a nice website about these
islands, you can have a look at www.faroeislands.com.
We are approaching 100 countries in our subscription list, and you
can see whether your country is already in our count by
looking on our web page under countries.
A few weeks ago, Alexander (Sascha) Hoeher sent me a piece he had
written in German on method overloading. The writing was persuasive
and interesting, so I asked Sascha if he could perhaps translate it
into English so we could feature his ideas on this newsletter.
The ideas were based on something that I had written about 2.5 years ago,
on Depth-first Polymorphism, but
Sascha added a dimension to the argument that
I had missed. He asks whether perhaps overloading is harmful? I
enjoyed both the German and the English version - I trust you will also.
The format is a bit different to my usual newsletters, but I encourage
you to spend the time reading through the prose and let use know what
you think of this.
Alexander Hoeher is based in Weilheim near Munich, Germany. After
getting into software development with C++, he found himself attracted
to Java, seeing its simplicity as a necessary basis for architecting
reliable complex systems. A Sun Certified Enterprise Architect by now,
his ideas about object-oriented design have been strongly influenced by
playing around with Eiffel and digging into UML literature (the latter
resulting in an IBM UML certificate). His current interests comprise
concurrent programming (formal process modeling), design patterns,
and XML related topics. He is fond of juggling, monocycling, swimming,
and a dedicated guitar-player.
Overloading considered harmful
What is overloading, once again? Same method name for different
methods - sounds harmless enough!
Sure it's one of the first things Java programmers are
confronted with when learning the language. You are told things
like: Do not mix it up with overriding - remember, these
things may look quite similar, but are altogether different
concepts! Then your Java introduction goes on telling you
about unique parameter lists, and after one and half pages
you get the impression that this is something not so terribly
hard to understand. [HK: I can vouch for this argument.
In my Java courses, students commonly make this mistake.]
What is the value proposition of this seemingly simple
feature?
Shorter interfaces, not bogged down by artificial, tiresome
discriminators, and a bit of lexical structuring of your
class text: Overloading allows you to indicate the conceptual
identity of different methods, letting you stress common
semantics across methods so that similarities become apparent
at first sight. It's supposed to make your code more readable,
and what regards server code - the code, where these method
siblings are defined -, it really does.
There are many who like it. There is tons of code using what
overloading has to offer. And of course, you cannot even escape
it in Java, where you're simply forced to use it when you want
to provide different initializers. It seems, overloading rules
- a feature not only popular, but tightly integrated into some
important programming languages, an open commitment of venereous
language designers that surely does not fail to impress the
masses. And, what is more: no performance penalty whatsoever...
Now, should we fully embrace overloading in our own code
then? Should we use it wherever possible? This discussion shall
present an attempt to put the technical facts investigated
in-depth by a former edition of this newsletter into a usage
perspective - a bit similar in spirit to the popular harping
on pointers which you can find in every Java introduction. The
seminal idea that overloading clashes with dynamic binding
is taken from a discussion of overloading to be found in
"Object-Oriented Software Construction" by Bertrand Meyer.
There is no reason to question that naming conventions to
indicate conceptual interrelatedness of different methods will
benefit the class text where these methods are defined. To
adopt the convention of reusing the same method name, however,
has unfortunate consequences on client code which can become
quite unintuitive, to say the least.
Overloading with parameter lists of different length pose
no problem for client code interpretation, as they openly
disambiguate client calls at first sight. Things that could
irritate you just will not compile. However, when overloaded
methods with the same method name have parameter lists of the
same length, and when the actual call arguments conform to more
than one signature of these overloaded methods, it somehow gets
a little hard to tell which methods are actually executed just
looking on the client calls. In this situation, you experience
the strange phenomenon that the methods being called are not
independent of the reference types being used for the calls.
There are several problems related to this, but first let's
take another look on the small code example presented in a
former edition of this newsletter in order to really get a
feel for what it's like when methods being called are not
independent of the reference types being used for the calls.
A minimal modification allows us to focus on the ugly side
of overloading: The program still tells us which method gets
actually called, but on top of that also delivers rather
strong comments when overloading is caught to harm our ability
to reason about the client code without knowing the server
classes.
Basically, we have two fixed instances, which will play always
the same roles: one serving as call target, the other serving
as argument. Now we mix and match several calls always to
be executed on these same instances (always the same target
object, always the same argument object) the only difference
being the reference types through which these objects are
accessed. And behold: Different methods are being called. If
you are familiar with this simple setting, you may skip the
program part to directly go on with the following discussion.
public class OverloadingTest {
public abstract static class Top {
public String f(Object o) {
String whoAmI = "Top.f(Object)";
System.out.println(whoAmI);
return whoAmI;
}
}
public static class Sub extends Top {
public String f(String s) {
String whoAmI = "Middle.f(String)";
System.out.println(whoAmI);
return whoAmI;
}
}
public static void main(String[] args) {
Sub sub = new Sub();
Top top = sub;
String stringAsString = "someString";
Object stringAsObject = stringAsString;
if (top.f(stringAsObject) == sub.f(stringAsString))
//if (top.f(stringAsObject) == sub.f(stringAsObject))
//if (top.f(stringAsString) == sub.f(stringAsString))
//if (top.f(stringAsString) == sub.f(stringAsObject))
//if (sub.f(stringAsString) == sub.f(stringAsObject))
//if (top.f(stringAsString) == top.f(stringAsObject))
{
System.out.println("Hey, life is great!");
} else {
System.out.println("Oh no!");
}
}
}
Can you tell what happens with activating each of the conditions?
Let us carefully go through the code.
- There are two overloaded methods spread across a class
hierarchy (one class inheriting from another class). This is
the server code to be called by the client.
The superclass defines: String f(Object o).
The subclass defines: String f(String o).
The signatures are chosen to make both methods eligible
candidates to be executed in the context of calls on the
subclass instance with a String argument.
- The client provides two objects, reused for all calls and
chosen in a way that both overloaded methods are potentially
eligible candidates for executing the client calls.
- Through polymorphic assignment, the client obtains references
of different types for these two instances.
- The client makes method calls that differ only in the
different references used for making the call. In the given
setup, there are 4 different call forms possible: Overloading
has the method name fixed, so only the target reference type
and the parameter reference type are variable. Every reference
type for the target can be combined with every reference type
for the argument. (Mathematically spoken, there are 4 binary
strings of length 2).
- The comparisons then are really just for fun, eliminating
detail. They shift the focus of attention from the question
what particular method gets called to the general insight
that different methods get called, additionally allowing
the program to be explicit about its likes and dislikes:
Every case of seeming reference-independence of the calls
is instantly interpreted as an example of how things should
be, and welcomed with a happy, optimistic "Hey, life seems
great!" In those some dark moments, however, when overloading
casts its dark shadow upon the else so object-oriented Java
world, and just nothing seems right, our little program starts
to complain... Combinatorics tells us six 2-combinations of a
4-set (consisting of 4 call forms) exist, and so you find
six comparisons (five of them showing up as comments), but of
course, already one single predicate returning false (different
methods having been called) suffices to get the point across.
And that's it.
Discussion
The program shows, once again, that one thing to be aware of in
connection with overloading is that it's all about reference
types. This is as true for target reference types as it is
for parameter reference types. For instance, the predicate
"sub.f(stringAsObject) == sub.f(stringAsString)" will resolve
to false in our setup because two different methods are
executed. This dependence on reference types in connection
with overloading may or may not be what you expected, but the
question remains if this is a clean approach to object-oriented
programming.
No doubt, this may puzzle many a brave programmer, as it is
a result absolutely exclusive to overloaded methods. And,
as the use of overloaded methods does not identify itself
as such in the method call, the intuitive, but unfortunately
wrong expectation might be that the predicate returns true,
as it would be the case with any gentle non-overloaded method.
Honest, do we like this? No. Object-oriented programming,
as we know it, is about objects, not about references. We
expect objects to behave the way they are and not the way
they are referenced. Objects do their thing regardless of the
role the client assigns them. This is how it should be be,
and we call this thing dynamic binding. It is not cosmetics,
it is not just a feature, it is THE feature. It shapes the
architecture of our systems, decoupling clients from servers.
Now, with overloading a second rule, reference type dependence,
takes over, breaking the fundamental polymorphic equivalence
property described above (that polymorphic assignments do
not change the results of method calls as long as the code
can be compiled). The choice of references in the client,
which should be based on considerations like grouping and low
coupling, suddenly has to take the demands of overloading into
account. Overloaded server objects affect the design of client
code. Cosmetics beat structure. Unlike overriding, overloading
cannot just be applied in a server method definition act and
end of the story. It is a feature you have to stay aware of
in your clients whose specific referencing of server objects
influences what functionality gets called in the end. While with
dynamic binding alone the method to be executed is completely
server-defined, overloading proves to be client-sensitive.
Now to the problems. An important issue closely connected to
software quality is readability. Our ability to reason about
the software text is essential for any kind of maintenance,
and, as you might have guessed by the direction this discussion
has taken by now, overloading affects readability of client
code in a rather negative way. It is all very well to let the
program run and after the surprise look at the server code
and explain the strange things away (oh, of course, overloaded
methods, you know...), but nevertheless it would be preferable
by far to predict the behaviour, simple as it is, by simply
(i.e. exclusively) examining the client. Show me the client
class, tell me no overloading is involved, and I tell you:
"Hey, life seems great!" I can reason about the result of the
condition solely looking at the client class.
With overloading being introduced, or even with just the
slightest chance of overloading being used (this includes
all unknown Java code), this statement is impossible to make,
because you cannot tell if the same server method gets called
without examining the server sources. In our program, you
would have to read three classes instead of one to know what's
going on. So, use of overloading weakens the expressive power
of client code as the polymorphic equivalence property cannot
be relied upon.
Sometimes, of course, you are willing to dig into the server
code because you want to find out the exact server method
that gets called. But even then overloading significantly
complicates things. Without overloading, you just work your
way bottom up through the target's class hierarchy, and when
you find a match, bingo, you're done! With unknown code or
code known to use overloading, this can be only your second
step. First you have to examine the class of the reference and
find the matching method. Only then can you check the class
hierarchy for overriding methods. The bad thing about this is
probably not the additional step involved, but that you have
to repeat this analysis for every different reference type,
because results can vary. Thus, overloading complicates the
analysis of client-server interaction.
There is also a psychological dimension to all this. The
following will try to show that overloading is not a gentle,
unobtrusive language feature, but, as it stands in conflict
with other language features, late binding and inheritance,
particularly prone to abuse. In other words, overloading is
an open invitation for introducing conceptual errors. Think
of novice programmers or programmers in a rush. Overloaded
methods, coming with its own method selection rules, present an
anomaly in the object-oriented landscape shaped by the presence
of dynamic binding, and will surely go on to puzzle people,
who will falsely think overloaded methods behave like "normal"
methods, or mistake overloading for overriding just because
the methods signatures involved in overloading look so similar.
In fact, such a mistake may be seen as expressing justified
desires regarding object-oriented design. Hell, we'd sure like
to see the overloaded methods in our example being handled as
an instance of overriding! The parameters of our methods are
related through inheritance, so inspired by other programming
languages, it does not take great imagination to see the derived
class define a method that overrides the inherited method. Of
course, this is an additional twist adding a bit of vision
to our discussion, and of course, we know that Java does
not support such covariant method redefinition (restricting
the parameter domain of the method): Most of us have learnt
by now that Java allows only specification inheritance
(overriding being only defined for methods with the same
return and parameter types). But still. Do we not think,
deep in our heart, that the subclass method with the more
specific parameter should, in a better world maybe, be the one
in charge, overriding the superclass method? Think about an
Integer class inheriting from Number while redefining addition
for integers only. Not allowed in Java, but still desirable
(and a real feature in other languages such as Eiffel). Sure,
overloading is not to be blamed for an incorrect understanding
of inheritance in Java, but it clearly invites such fantasies
(and the corresponding errors) when used in a context such as
the presented one. And even if such interpretations are wrong -
shouldn't they be right?
And then the poor integration of overloading and inheritance
in Java, which is very misleading as well. Reference type
dependence means that overloading is simply not developed to
conceptual consistency in the context of inheritance. Guessing
from experience with overloaded methods defined in one and
the same class, we might expect the method with the best
match in terms of formal parameter type and actual method
argument to be called on the object. This does not happen,
though. Java does not produce any kind of "flat form" for the
object's class with all overloaded methods, inherited or not,
appearing side by side in a list in order to allow the runtime
to choose the most appropriate.
No, what technically happens, is, in my understanding,
that the compiler takes the method symbol plus the parameter
reference types of some method call and calculates a position
in the method table of the target's reference type. So,
choosing between overloaded methods is done compile-time,
and it is restricted to the overloaded methods of one class:
the class of the reference type. Overloaded methods defined
in subclasses of the reference type are never called: Java
ignores the exiled siblings although the whole thing looks so
very similar to overriding.
With overloaded methods being defined in superclasses of
the reference type, Java exhibits quite strange behaviour:
While the server code can still be compiled, client code will
break: Trying to make a method call where the compiler would
have to choose between them, you get a compilation error,
complaining that the call is ambiguous. Put the method into
the reference type and all is well. Don't ask me why - just
remember selection of overloaded methods is limited to the
reference type class. I personally believe this further anomaly
might is more a compiler issue than a language issue. If you
find a logic explanation for this, other than that it helps
to improve compilation performance, please let me know.
Once again (the last time): overloaded methods defined
in subtypes of the target reference will not be taken into
consideration as candidates for execution by the runtime. With
the table position given in the bytecode, the runtime will
only check if there are overriding methods (which will appear
at the same position in the method tables of subclasses if
they exist). So, the compiler cannot hunt them down, and the
runtime does not want to.
A consequence of this is, disturbingly, that the place where
non-overridden overloaded methods are defined in the class
hierarchy is of essential importance what regards the selection
of the method being actually called. To me, this sounds a
little scary, or would you really want your class design to be
influenced by the crippled demands of overloading? Summing up:
Overloading is a static compile-time feature which does not
integrate well with our expectations shaped by dynamic method
lookup coming along with inheritance.
What else can we do to shoot the dead man? (Who is still alive
enough to ruin our programs, of course.) Bertrand Meyer sees
overloading as a violation of the "principle of non-deception:
differences in semantics should be reflected by differences
in the text of the software" (OOSC 94, Bertrand Meyer). But
wait a second, isn't late binding another case where there is
only one method symbol for different methods?
As I understand it, the difference between late binding and
overloading can be pinned to the observation that late binding
lets one method name to be the pointer to one operation contract
(which then can be fulfilled by several different methods whose
differences are nevertheless absolutely transparent to the
client code), whereas overloading lets one method name to be the
pointer for several method specifications whose differences can
be experienced in the client code. In the scope of the client,
there is no difference between polymorphic calls bound to
different methods. The polymorphic call specification is all
the client has to know about the call. Overloaded methods, on
the other hand, need not share common semantics, to be more
precise, a common contract, their pre- and postconditions
potentially varying wildly. This is something the client
always has to take into account: Overloaded methods can not
be used interchangeably, as different methods just under
the same hood they have to be treated according to their
specific contracts. These contracts, however, are hidden
behind the same name which makes them hard to identify.The
same method name does not point to a common denominator,
in this case, but only serves to disguise differences that
have to be laboriously disambiguated lateron. The client has
to stay aware of the method contract being pointed to by a
complicated three component key for the method which, as we
have seen, consists of target reference type, method name,
and parameter reference types.
So what are my final words to the programmer who, after having
read this article, wonders if he should try to use overloading
now wherever possible or not? Keep going... And if you really,
really want to use it, go on and do so, but only with different
method names - this is a trick stolen from real experts that
can improve your overloading a lot! :o)
Sascha
Language Articles
Related Java Course
Discuss at The Java Specialist Club
|