Programming concurrently with threads

Software that runs faster if run on more CPUs in parallel is cool. To create an algorithm so it runs fast and efficient on one single CPU can be tough enough, and to create one that is efficient and that benefits from a multi-processor system is an art and a true craftsmanship. Threads are a nice way to program in parallel and my favorite implementation is Sun Solaris threads. Lightweight threads with a 2-level scheduler.

Programming with parallel threads can be a bit tricky. Especially if the different threads are to access and use the same shared data. We must ensure that certain accesses to shared data structures can't be interrupted by other threads which could lead to inconsistent and corrupt data. Locking is a clever way to do this. The thread that got the lock have access to the shared structure. If other threads want to access the structure at the same time they will have to wait for the lock to be released.

Note: All Java examples on this page requires JDK1.2 or better. ArrayLists are not supported by JDK1.1 but the examples can easily be converted by using Vector instead.

Mutexes

The simplest form of lock is called a mutex. In Solaris and other POSIX compatible thread implementations there exists a few system calls to handle mutexes. pthread_mutex_init(), pthread_mutex_destroy(), pthread_mutex_lock() and pthread_mutex_unlock(). Java has a different solution for simple locking by the use of the synchronized keyword. In Java all objects have a basic locking mechanism and when a synchronized statement or method is run, the object's monitor is locked and thus stopping all other threads to invoke synchronized methods or statements on that object.

A Locking example in Java

A while ago I began to explore the mutex and condition variable possibilities in Java and made a few tests. Here is an example of how to create and use a simple locking class in Java: The lock class: SimpleLock.java, Simple test program: SimpleLockTest.java The locking class implements an exclusive lock (only one owner) using the condition variable method of locking and supports recursive locking (a thread can lock the same lock many times, and must unlock it the same number of times to finally release it). Locking requests on a busy lock will suspend the calling thread and place it in a FIFO queue.

Monitors

A monitor is a lock that encapsulates variables and method invocations.

Condition variables

Mutexes should only be used for short time locking. If the lock is to be held for a while another mechanism should be used called condition variables. In POSIX these functions are used for condition variables: pthread_cond_init(), pthread_cond_destroy(), pthread_cond_wait(), pthread_cond_timedwait(), pthread_cond_signal() and pthread_cond_broadcast(). Java has support for condition variables and it is too included in every object. All Java objects has Object as super-class and Object has these methods: wait(), wait(timeout), notify() and NotifyAll(). notify() is the equivalent to pthread_cond_signal(), and notifyAll() is the equivalent to pthread_cond_broadcast().

To use the functions/methods (not including POSIX init/destroy functions) for condition variables, we have to have a lock on a mutex first. Waiting on a condition variable could mean that the thread must wait for some time before the wait function/method returns. Wait methods/functions therefore release the mutex/synchronized monitor in the meantime. When the thread wakes up the mutex is re-aquired before the wait method/function returns. Other threads are woken up by invoking pthread_cond_signal() / notify(). signal() / notify() wakes up one sleeping thread, but it is not specified which thread. The choice is up to the implementation. It is guaranteed that at least one thread is woken. phtread_cond_broadcast() / notifyAll() wakes up all sleeping threads.

Semaphores

A semaphore is a locking structure like a mutex, with one big difference: A semaphore can be locked several times simultaneous. Mutexes are either free or busy and only one thread can own it at a time. Mutexes are sometimes called binary semaphores. Semaphores have an amount of how many times they can be locked. If a thread tries to lock a semaphore that have been maximum locked, it is suspended until a lock is released. The locking and unlocking routines are sometimes called "down" and "up" where "down" means to lock and "up" to unlock. POSIX have a standardized set of semaphore functions. sem_init() and sem_open() are used to create or find existing semaphores. sem_close() and sem_unlink() are used to release and. delete semaphores. sem_wait() and sem_trywait() are used to lock (down)semaphores, and sem_post() to unlock (up). Java has no primitive built-in support for semaphores, but a semaphore class can be easily built using a counter and synchronize/wait/notify-operations.

Barriers

Barriers are an easy to use construction that is very useful for synchronizing two or more threads. A group of threads that want to synchronize themselves enters the barrier. Only when all threads have entered are they allowed to continue executing. Besides easy synchronizing, this construct can be used to create a multi-threaded "pipeline" architecture, where the different threads represent steps in the pipeline and communicates with the previous and next steps using shared variables protected by barriers. This kind of pipeline can speed up applications that spend much time waiting for I/O, making use of the I/O wait time for other processing.

Barrier example in Java

Here is a simple example of a barrier class in Java and a corresponding test class: Barrier.java, BarrierTest.java.

A problem with locks and multiple threads

We can run into problems with locking structures if we are not careful. One big problem occurs if the different threads have different priorities. Let's say a low prioritized thread gets the lock. A high priority thread starts to run and tries to lock on our locking structure and gets suspended until the lock is released. Because the owning thread has a low priority it might take a while for its calculations to be completed, because other medium priority threads are favored by the scheduler to be run on the CPU. It may take a while before the low priority thread is done and can release the lock. This becomes a problem when the high priority thread must run as fast as possible which is the meaning of giving it higher priority. This dilemma is called "Priority inversion". A solution to this problem is to give the thread that owns the lock the same priority as the high priority task that waits for the lock. When the lock is released by the first thread, its priority is reset to what it was before. This avoids the inversion and the high priority task will run as fast as possible. If more than one thread waits for the lock, the owning thread's priority should be set to the highest priority among the waiters.

The GUI Thread

Most operating systems uses/can use a graphical user interface, GUI, of some sort, and it is common and user friendly that applications make use of them. A few GUI programming environments are multi-threaded too, which is good. Other environments can't handle multi-threading directly. This is common among systems that have a few years on the neck. Those old systems work best if you keep all GUI operations within a single thread. There is a common pitfall with a single GUI thread that can lead to some problems. GUI environments tend to be based on call-back functions or methods that gets called whenever something has been input or clicked in the graphical application. The GUI thread paints the buttons and input gadgets if they change or need to be refreshed. The GUI thead also makes all the call-back calls into the application. If, for example, a certain calculation need to be done if the user clicks on a button, it is very easy just to do the calculation in the button's call-back method. If it is a small calculation, it usually works, but if it is something that might take some time (maybe half a second or more) like if a file needs to be parsed etc. some bad side effects might happen. If something happens while a call-back is running, like if the application window gets resized, then the buttons and input gadgets may have to be repainted. Repainting is the job for the GUI thread, but it is occupied and the repaint request get queued in the GUI thread's queue of unprocessed events. The repaint won't run until the call-back returns leading to an ugly window with graphical errors in the meantime. Another bad thing is that the GUI thread might have higher priority than ordinary threads, to keep the graphics going even if other threads use all CPU power available. It is not likely the intention that a heavy calculation should run with the higher priority.

Why didn't the GUI system designers use some more threads to handle the graphical events and painting, and not just one? Well, they could have done that, but there are some reasons for not using more than one. One reason is that GUI systems want to guarantee to the application that all call-backs are called in the exact same order as the user clicked or input'ed something. The GUI system have no idea if clicking one button before another is important or not, that depends on the application, and therefore it guarantees the order to be able to work with situations where it does matter. Using a single thread that makes all call-back calls one by one in the event order is an easy and, in most cases, working solution. Another reason for not having several threads is that more threads makes things much more complicated for programmers that don't want or lack knowledge of how to use multi-threading in their programs. A single GUI thread guarantee that no code will be run in parallel. You don't have to care about avoiding race contitions or if things need to be synchronized. Other good reasons is that more precious memory and CPU is used with more threads, and more than one thread introduces performance problems. The threads will have to synchronize with each other quite often to be able to guarantee event orders etc., and heavy synchronizations usually lead to performance penalties.

The solution to the problems caused by occupying the GUI thread is to use separate threads for time consuming calculations triggered by call-backs. A call-back just sends a message to the thread to do the calculations and then returns, freeing the GUI thread so it can do what it needs to do uninterrupted. There is of course a drawback with this, that you have to watch out for. If the GUI thread is free'd immediately, it can make another call-back call from another input event, while the processing of the first event is still running. This can lead to problems if the first event's processing determine that the button or whatever produced the second event should be hidden or disabled. The second event call-back could be executed even though it shouldn't have. That would not have been a problem with the one-thread GUI, because the second event had been taken care of after the return of the call-back of the first event. If the first event's call-back disabled the source of the second event, the GUI thread would have known that and discarded it. If a separate thread is used and the GUI thread is returned, things like this must be taken care of manually, possibly introducing synchronizations and locks. The gain of a responsive GUI can be worth this extra work though.

- Moggen