KalyanChakravarthy.net

Thoughts, stories and ideas.

Java synchronization over multiple shared objects

When developing a multi-threaded application in Java or any threaded environment as a matter of fact, the first and the foremost issues that one would need to consider is synchronization of shared resources.

To synchronize a shared object or resource in java, its as straight forward as using the "synchronized" keyword while declaring a function or as a code block

public class Foo() {
    //Way 1
    public synchronized void bar() {}
    //Way 2
    void baz() {
        synchronized(boo) {
            //some sync foo
        }
    }
} 
Whether to synchronize the code block by wrapping them under a synchronized function, or a synchronized code block over the shared resource, is upto the programmer. For a single object, the synchronized code block works just fine, but when there are more than one object to be shared by multiple threads simultaneously, one of the very important thing that usually skips the mind leading to deadlocks in the execution are - keep the statements in the same order throughout the application thread.
Consider (the bad way)

final Object foo;
final Object bar;
Runnable tObj = new Runnable {
    public void run() {
        while(true) {
            //sync #1
            synchronized(foo) {
                synchronized(bar) {
                //sync foo
                }
            }
            //More code foo
            //Some more code
            //sync #2
            synchronized(bar) {
                synchronized(foo) {
                //sync foo
                }
            }
        }
    }
}
Thread t1 = new Thread(tObj);
Thread t2 = new Thread(tObj);

Note the different synchronization block order in step #1 and #2. As the execution starts, the first thread t1 obtains locks on foo in the first block, and then on bar if its free as well. Until the synchronized block execution is complete, the second thread is kept waiting for resources.

If each thread cycle is guaranteed equal execution time, it doesn't pose a problem as the execution across different threads won't fall out of phase. Most often than not it is not the case and each thread cycle is gets different amount of cpu time each time, and if the thread depends upon external resources such as db, files, sockets, etc, the order of execution is unpredictable.

As the execution starts to fall out of phase, Thread t1 currently executing at Stage #1, obtains the lock on foo, and requests for lock on bar. But the other thread t2 which is already out of phase might be executing at Stage #2, and obtained a lock on bar. The first thread t1 waits for t2 to release bar, the second thread waits for thread t1 to release foo, so that each can complete their execution. This leads for each thread wait for resources obtained by each other leading to a deadlock.

In an actual application scenario, the deadlocks don't appear immediately until such an instant where the threads are executing out of phase, depending upon the application, it might take minutes or even hours.

I have come across such a situation while writing a java based multi threaded web crawler, which maintained two shared objects a Queue and a HashSet. Queue contained a list of all URL's to be parsed, and HashSet contained the completed set of URL's. Initially when tested with 1 thread, it just worked fine, for the obvious reason being there are no other competing threads. But when the number of threads were increased to two, it took about 10 minutes for the deadlock to happen and the program halted. Increasing the number of threads even further increased the deadlock state was reached a bit faster, the reason being the probability of increase in concurrent requests for locks.

One of the better ways of staying safe is to wrap the sync logic inside a synchronized method itself, but again one has to make sure that if there are many different functions, the order of objects should remain the same.

Similarly deadlocks can occur even when nested synchronization is done inside a loop, i.e a nested synchronized block inside a loop which is inside another synchronized block.