Overview of Quercus Robusta Library
Overview
There are many various versions of notations and definitions
of the terms described below. Before reading please understand that while it is
perfectly valid to motivate one's definitions it is useless and counter-productive
to claim that some definition is incorrect (because it is definition
not statement). After all realists lost the battle to nominalists centuries ago.
While in many OO programming languages (e.g., C++, Java, Eiffel) inheritance
and sub-typing are lumped together, they are quite different things. To understand
this difference it is necessary to introduce some definitions:
There are some fundamental notions:
object, state, behavior, message, method and
value.
Object is identified sequence of values.
That is object is pair (id, seq) where id is some identity (not further specified)
unique among all objects and seq is mapping from ``instants of time'' to the
pairs of form (m,s) where m is the methods suit and s
is object's hidden internal state (which is in turn object?). That is seq:T -> M x O
Methods suite is mapping from language-dependent ``signatures'' to the methods (what is the method?).
In traditional languages there are two parts of the signature: one
that is used during method lookup (name, types and names of parameters and results etc.) and another
that is not: exceptions thrown, pre-post-conditions, concurrent access specifiers etc. This ``unused''
part is actually used but through separate language mechanisms: e.g., Java requires try/catch method
calls in accordance with their specification etc. It would be nice to unify.
Object has some hidden state and behavior
associated with this state. The only way to exhibit object's behavior is through sending
messages to it. Object maps messages to methods i.e., pieces of code acting over their
arguments and object's state. State of the object can change during object existence.
Value is object with it's state fixed.
That is value is a pair (o,s) where o is an object and s
is some possible state of the object o . Then one can look at the object as a collection
of values sharing the same identity. (Does identity belong to the state?) There should be
some structure imposed on this collection but in general (in the multi-threaded setting) there is no
absolute time so I am not sure that this structure should be one of the linear ordering.
For each object there is space of all possible values for this object (i.e., of all its
possible states) determined for example by its class and object itself is just some kind of
trajectory in this space.
Type system is some covering of the
class of all values. That is, the type system partitions all values into possibly overlapping
subsets called types. Power of the typing (i.e., existence of some type system built into the
language) is evident in languages with the static type-checking. In any conventional
language expressions produce values and are recursively built from simpler expressions.
In the statically typed languages most basic expressions (variables, constants etc.) have
types known at the compile time attached to them. Also, the statically typed language
enforces some restrictions on possible ways of forming expressions such that once
types of basic expressions are set, you can assign type to every valid expression
(It is not that easy in practice because there are recursive types etc., so type-checker
will probably need to employ some smart algorithm to check validness of some expressions
but the general idea is clear.) such that every evaluation of this expression will
produce value of that type. Now you can reason about the run-time behavior of the program
using the information about types of values produced by expressions.
If your type-system becomes more fine-grained, that is the more types you introduce, then
you get more information about run-time properties of programs; on the other hand
typing will be more complex, hard to understand and programs will be harder to change.
So it is necessary to make decision between complexity of type system and amount
of run-time information it catches.
As it was noted earlier objects perform
computations in response to the message sends. But it would be inconvenient if not impossible to
build objects that can response to the each and every possible message. As a matter of fact in
untyped language it is quite possible for object to receive message it doesn't understand. This
doesn't necessarily mean that program will crash, to the contrary some systems
(e.g., P. Curtis's LambdaMOO)
make smart use of such situations to implement otherwise impossible behavior. But in general,
to sensibly response to message you should understand it. Languages like Smalltalk rely on
programmer discipline, documentation and external checking tools (soft-typing) to guarantee
that every message send will be successfully resolved at run-time. But when building complex
software with code created by multiple developers scattered around space and time one feels
desire to get more robust mechanism to assert that execution will not end up with infamous
Concord.OilPressureSensor: message isSuitable:with: not understood
Especially when one happens to fly in this very Concord
The question naturally arises: can
we use type systems described above to guarantee that every message send will be resolved
to the method? The answer is yes. And the type-system required is remarkably clean and simple. To build
it we need the notion of the interface.
Interface is a collection of message
signatures. Message signature is language-dependent data that can be used to distinguish
various messages. That is given some object o information in the signature S
make it possible to predict the result of the lookup of any message of the form and message o.m
where message m has (or conforming to) signature S . In the simplest
case signature consists only of message name (selector). When language already enjoys some type
system signature may include types of arguments and return values, exceptions possibly thrown
during execution of method bound to this message, concurrency restrictions etc.
There is type TI naturally
associated with any interface I : value (o,s) belongs to TI
iff for every signature S in I and for every message m
conforming to S any message send o.m( ... ) will result in successful
lookup, that is iff object o can respond to all messages with signatures in
I . This doesn't imply that messages with signatures in I are the only
messages o can respond to.
Now if expression e has type
TI than any message send in the form e.m( ... ) where m
conforms to signature from I will be successfully resolved at run-time. And if
moreover such ``type-safe'' sends are the only sends allowed by language than any
message send in valid program will be successfully resolved and all messages will
be understood.
Of course this doesn't mean that message send have to be the only operation and types
of the form TI have to be the only types defined by the language.
If we limit ourselves to the language where the set of messages understood by the
object doesn't
depend on its state and is constant during object life-time (if for example method
lookup is really implicit and is not controlled by the user) than it is obvious that we can
lift interface-based typing from values to objects. That is we can assign types to the
objects themselves.
In languages with types like [ 0 .. 9 ] an object
may belong or not belong to such types depending on its state.
That is we have two separate but interconnected structures in the
language: typing and operations. One of the purposes of typing is to assure correctness
of operations. But operations themselves may be defined in terms of types. Of course
it is possible to construct type-system such that any expression will execute correctly
iff it is well-typed (it is enough to take all possible execution paths as types or
some-such), but such type system will be patently undecidable (it will be impossible
to build static type-checker).
Type-system based on interfaces
guarantees that operations of some kind (implicit operation of the method lookup during the
message send) will always succeed. But it also get us for free another valuable property:
polymorphism by inclusion.
Method lookup during message send is implicit operation and
array indexing is explicit. Interfaces give us type system expressible enough to assert the
correctness of the method lookup. One can imagine type system to assert the correctness of the
array indexing. It will be probably equivalent to some kind of the integer linear
programming. On the other hand there are issues like concurrent access to the shared state
which are much harder (?) to express as operations and to envisage a type system to
deal with them.
Polymorphism is the property of
parametrized code to work with parameters of different types. Polymorphism is divided
(by Cardelly and Wagner(sp?)) into two main classes: universal and ad hoc. Examples
of universal polymorphism are genericity (templates), virtual types and polymorphism by
inclusion. Examples of ad hoc polymorphism are coercion and overloading. Polymorphism is
the main mechanism for the code re-use (after cut-n-paste :-). And successful re-use is
what makes possible creation of large libraries and frameworks which are the foundation
of the large-scale software development.
As have been mentioned above types
can overlap. Type A is subtype of type B if A
is subset of B that is every value of type A is also
value of type B . Note that sub-typing is defined as relation on types
(not operation).
Suppose now, that message send is the only
operation in our language with interface types assigned to the objects and we have
code p(e:TI) parametrized over parameter e where e is
required to be an object of type TI . If we have an expression of the form
p(a) (This of course contradicts supposition that message send is the only
operation because p(a) is manifestly not message send; what we mean is message
send of the form x.f(a) where message f will be resolved to the method
p .) where the argument a is of type TI ; then well-typedness
of p means that for every message send of the form e.m( ... ) in
p message send a.m( ... ) will be successfully resolved. That is
if we call p with argument a conforming to the interface
I no message lookup will fail. But suppose we have object b
conforming to the interface I' where I' is I plus
some additional signatures. Then b evidently conforms to I and
therefore belongs to TI and during call p(b) no message
send will fail. That is code designed to work with parameter of type TI
will work with argument of type TI' where I' is
surinterface of interface I and TI' is
subtype of type TI . Great! We get mechanism to re-use code.
Concrete type system provides for re-use of code using only
operations that type system guarantees correctness of. For example interface-based
type system will not guarantee you that index will not fall out the end of array
or integer operations will not overflow. But different type system can. And you can
use all of them in the same time.
It is very important to understand that inclusion
polymorphism allows for re-use of client-side code: code that
is re-used is using code (methods) of object that is changed. This is to be
opposed to inheritance where code that is re-used is code of object that
is changed. Inheritance is for the server-side re-use.
Fast
All
|