Wednesday, August 4, 2010

Hyper-virtual methods in java.

Here is what I think about the virtual methods in java.

In java, by design and specification, all the non-static, non-private, non-final, non-constructive methods are virtual. This means, the selection of the method to be invoked at a call site will depend on the actual (runtime) type of the invoker object (reciever), rather than its declared type.

In the case of C++, this is true only when the invoker object is declared as a pointer type, and the method is declared explicitly as 'virtual'. If either of this is false, then the method is resolved (identify and select the definition) always to the definition in the defining class of the declared (static) type of the invoker.

In contrast in java, since there is no pointer, there is no flexibility for methods to exhibit virtual and non-virtual behavior based on the type declaration mode - there is only one way to cite objects, that is through references. And moreover, in JRE implementations, java object looses its connection with the declaring type and gets associated with the defining class. At this point it is imperative that the virtual keyword be removed and designate all the normal methods as virtual.

But how often a program really requires the virtual property? Very rarely. What is the percentage of virtual methods who exercise this feature in a meaningful manner? less than 5%. Even in those cases where multiple subclasses are designed and methods redefined, an efficient programmer will go for an interface (or abstract class) for the base class, which means the base method is pure virtual(abstract), not virtual.

This precisely means that a normal, concrete java method (designed to be virtual) actually utilizing its virtual-ness is a rarest possibility.

Implementing virtual methods is easy in JREs, but their presence make the execution engine incapable of pre-linkage of the method call site, potentially slowing down the performance. In practice, the method resolution has to wait until the execution reaches the call site. Dynamic compilers devirtualize methods up to an extend, by tracing the source of the invoker object in the neighborhood of the call site, but this does not really alleviate the problem, and adds it's own additional computation overhead. One of the potential challenges of JIT today is the inability to perform inter-procedural analysis and compress the code any further owing the extremely delayed method resolutions. A powerful technique called ahead-of-time compilation is rendered ineffective because of the inability to resolve methods in advance.

The decision to qualify all the methods as virtual was not a well thought-out design, instead an un-anticipated side effect. An accidental by-product or an unexpected misfire came out of the pointer-less design.

Object leaks in java.

Here is what I think of Java parameter passing conventions.

At programmer's level, Java is said to pass objects by reference and primitives by value. This means for references, what the callee receives is a  heap address of the object, and the object references themselves are actually passed by value. This also means Java saves some space and effort in copying the entire object onto the subroutine linkage channel (for example stack memory).

By definition, pass by reference means 'a parameter passing convention where the lvalue of the actual parameter (argument) is assigned to the lvalue of the formal parameter.'

When passed by reference, the callee method can manipulate the original object’s attributes, can invoke the methods of the object, can re-new, re-assign and purge the components of a composite object thus passed. These operations affect the original reference of the caller, because we have only one object in the heap, which are pointed to by both of these references.

For destroying an object, the C++ way is to 'delete' the object, and the C way is to 'free' the pointer. If passed by reference or address, both these languages have the flexibility of cleaning the object or a structure from anywhere in the caller-callee chain. The invalidation of an object indirectly invalidates other references or pointers cached elsewhere in the stack locations, and trying to reuse those references or pointers results in a crash.

This is different in java. Since there is no explicit freeing of objects, we rely on null assignment on the reference, which is the only way to force an object cleanup. Even after the callee nullifies an object, the object lives through the caller's reference. This means that an object cannot be freed (or initiated for freeing) from an assignee reference, when the a peer reference is alive, and vice versa.

This may be a conscious design to eliminate bad references and make sure that all the object references are either null or a valid object's address. This is because, in the garbage collection, the memory of unreferenced objects are not really freed into the system, rather kept in the internal free pool, and is still mapped into the process, and is accessible through stale references, and such a bad dangling pointer will actually cause more damage than a crash.

But then how to clean up an unwanted java object? Set your object reference to null and wait for a gc to occur? might not work because, if there is a second reference elsewhere in the stacks and registers, consciously or unknowingly, the object is not collected. Consequently, many of the objects the programmer has explicitly discarded will lay remnant in the heap until the last reference of the object also went out of scope. This may be sooner or later, or never.

Many of the memory leaks including the infamous Classloader leaks can be attributed to this 'hidden and under-documented' behavior of java. And this is the very reason we see more OutOfMemoryErrors than NullPointerExceptions.

Garbage generation in java.

Here is what I think of java garbage collection:

In java programs, the use of pointers is forbidden by virtue of a design strategy or a security policy. Without pointers, functions cannot access objects across stack frames, among many other limitations. The inability to pass objects to and from functions will limit the scope of a programming language at large. To remedy this, in java, user defined objects are inherently passed by address (termed as reference), in contrast to C and C++ where passing arguments by their addresses is a volitional choice.

Conventionally, when arguments are passed by value, what the callee recieves is an isolated copy of the passed object. In C, when passed by address, the callee can manipulate the caller's arguments. In C++ the same applies, along with the call by reference. The user objects are normally created on the stack. In cases of producer functions where the function generates and returns an object, the allocation has to be made in the heap (locally created objects cannot be returned from a function, which causes dangling reference). Such cases are not so often, so one can free the object manually which was 'newed'. Two modes of creating user objects are:

Class obj();              => object and the handle created on the stack.
Class *obj = new Class();    => object in the heap, reference on the stack.

In java, without pointers, the language semantics does not allow the above flexibility and we have only one way to create objects – either everything on the stack or in the heap, not both. Creating all the objects on the stack is a bad choice, since objects whose life span is greater than the defining method will be destroyed when the frame is popped off while the function’s return, essentially forbidding methods from returning generated objects, causing java to be an incomplete language. As a workaround, all the objects are created in the heap. Now, as a matter of fact, it is difficult for a programmer to delete all the objects he 'newed' which are quite many, rather most of them.

Hence the garbage and hence the collector.

In a non-java programming paradigm, it is like allocating memory at arbitrary heap locations, and later scanning the entire virtual memory to clean up the filth.

Garbage collection is not a java feature. It is a compromise. A consequence of refraining from pointers. A skillful attempt to mend a defect. An unchecked sun heredity and an unbridled software hypothesis which we carried and dragged all the way along.