The purpose of this article is to speak about some poorly understood mechanisms of the Java Virtual Machine. If you already know all these mechanisms, you should contact our RH to work at FastConnect
So please, consider the following program. If we run it with a java heap limited to 8Mb (thanks to the option -Xmx8m), it will fail with an ‘OutOfMemoryException’. But it will happen some time after 1000 loops ; other times after only 100 loops. It seems random.
class Padding
{
private byte[] padding;
Padding() {
this.padding = new byte[1024*1024]; //1Mbyte
}
protected void finalize() throws Throwable {
this.padding[0] = this.padding[1];
}
}
public class Main
{
public static void main(String[] args) throws Exception {
int cpt = 0;
try {
for(;; ++cpt) {
new Padding();
}
} catch (OutOfMemoryError oom) {
System.err.println("oom after "+cpt+" loops.");
throw oom;
}
}
}
Here a little quiz. Can you explain this behavior? Is there a leak in the program? Why the number of iterations before the OOM is random?
Before reading the following, I suggest you try to run this little program by yourself. Maybe you will have to increase the size of the padding array to reproduce the described behaviour.
The first question is not very difficult. The random behavior is due to the garbage collector. To provide good performance, recent Java VMs manage the memory at the same time as your code is executed. That’s why if the Java heap is not big enough it is possible to have an OOM even if there is no leak, even if the marked objects are always smaller than the java heap…
That’s why you must never be stingy with memory. A java program needs to have a lot of free memory to maximise safe and efficient memory management.
The next questions are more tricky. If we remove the ‘finalize method’, the program works fine, even with a very small heap. Can you explain why?
class Padding
{
private byte[] padding;
Padding() {
this.padding = new byte[1024*1024]; //1Mbyte
}
}
public class Main
{
public static void main(String[] args) throws Exception {
for(;;) {
new Padding();
}
}
}
Again, I suggest you try to run this program. If you open ‘jconsole’ or ‘jvisualvm’, you should see a very abnormal memory activity: the curve is flat (but few very rare accidents). So, I have two questions:
- Why the curve of this very simple program is so flat?
- Why the existence of a ‘finalize method’ disrupts the memory management.
As a java expert, I expect you already read (several times) the documentation about the java garbage collector, its tuning and its algorithms.
The default GC uses a generational algorithm where the allocation in Eden generation is a simple stack allocation. With our simple program, Padding objects are never promoted from the young generation to the tenured generation during the collection. It means that objects are fastly allocated in stack. At some point later the Copy GC is triggered and it copies nothing. In this particular – but not rare – situation the java memory management is more efficient than the C++ malloc because allocations and deallocations are held in stack and batched.
I hope my explanations were clear. If not, you should read the following article: Java theory and practice: Urban performance legends, revisited.
But we have not answered the last and most tricky question: why the existence of a ‘finalize method’ disrupts the memory management. A hint is in the title of this article. Please consider the following program:
class Padding
{
static java.util.Set<Padding> retention = new java.util.HashSet<Padding>();
private byte[] padding;
private int id;
Padding(int id) {
this.id = id;
this.padding = new byte[1024*1024]; //1Mbyte
}
protected void finalize() throws Throwable {
System.err.println("finalize is called for Padding[id="+id+"]");
retention.add(this);
}
}
public class Main
{
public static void main(String[] args) throws Exception {
int cpt = 0;
try {
for(;; ++cpt) {
new Padding(cpt);
}
} catch (OutOfMemoryError oom) {
System.err.println("oom after "+cpt+" loops.");
throw oom;
}
}
}
Again I suggest you test it several times with different heap size (-Xmx64m for example). Its behaviour is very stable: it fails all of the time at the same moment.
On blogs and forums, I often read “finalize is called when the object is reclaimed”. This is wrong! When a finalizable object is no more referenced, the JVM adds it to the JVM’s finalization queue. At some point later, the JVM’s finalizer thread will dequeue it, call its ‘finalize method’, and record that its finalizer has been called. It is only when the garbage collector rediscovers that the object is unreachable that it reclaims it.
This behaviour is a problem: it makes the code of the Garbage Collector very complex, it is error prone for the JVM providers and it consumes a lot of memory. It is the main reason why the Finalization is not in the Java specification for mobile phones and (credit) card.
Elsewhere there is no good usage of the Java Finalization. For example, if you close sockets thanks to a ‘finalize method’, you will have a big problem if you have a lot of free memory but not enough file descriptors. Garbage Collectors are good for managing the managed memory and only that.
You can refer to this article which explains how finalization works and how not to use it.
Cyril Martin (mcoolive).

 
Interresting article!
In .NET we have a standard and widely used interface to deal with deterministic non-managed ressource cleanup: IDisposable. This concept is even supported at the language level, through the “using” keyword in C# for example. But still, there are many ways to implement it the wrong way… The idea is to follow the disposing pattern (http://msdn.microsoft.com/en-us/library/b1yfkh5e%28VS.71%29.aspx), which relies both on the finalizer and the explicit cleanup entry point (the Dispose method of IDisposable)
The last article you link to seems to describes how to implement such a “disposing pattern” in java — I don’t know if it’s a standard practice though…
Is there some kind of well known java convention in this area?
I think the good practices are the same in Java and .NET because the problem is the same: how to manage OS resources.
Provide an explicit control (dispose pattern) is the most simple way to answer to this problem.
Thanks for your post…sometimes I used to notice “too many open files” errors under very heavy network load (only on Linux systems) on a Java app I was using/working on every day…but I never had enough time to investigate on this problem…since this app is OpenSource, I will check for “bad finalization idioms” in its code
Anyway, again, thanks for the explanation
On Unix machine, many “system things” are manipulated as files, in particular the TCP connections. By default, Linux limits to 1024 the number open files. See the help of “ulimit”.
Maybe your application doesn’t free TCP connections in the best way. Or maybe it consumes a lot of connections by design: for example Nirvana (a MOM) or GigaSpaces (a data grid) can consume more than 8000 TCP connections and it is a “normal behavior”.
The application on which I was working on at the time was a distributed bus. By design, it uses a lot TCP connections too, but they appear to be improperly released by some of its components.
I used to test this bus to the limit using high ulimit values and sensible system-wide descriptors limits on a 10GE network. The injectors were then set with a high thread count, no sleep time and small payloads (to maximize throughput).
After a few minutes, the system did not appear to recycle its sockets properly…and no more messages can effectively be sent by the bus due to a persisting “too many open files” errors (messages were then persisted). lsof did indicates that thousands of sockets remained in CLOSE_WAIT state.
However we didn’t encounter this problem in real world conditions. Your post has simply reminded me of this problem