Tuesday, May 22, 2007

Synchronizing my thoughts on Synchronized collection

I admit I was not exactly aware of how Synchronized collections work and what it means exactly to get a Synchronized version of a non-thread-safe collection. It all started with the following code that uses a non-synchronized Queue collection of C# to maintain a queue of tasks:

Main Thread:

public void EnqueueTask()
{
// Create a new task t
// Enqueue it into the Queue
JobQueue.Enqueue(t);
}

Consumer Thread:

public void DequeueTask()
{
// Check if number of items in the queue is > 0 and dequeue the task if true
if(JobQueue.Count > 0)
{
JobQueue.Dequeue();
}
}

Note that it is known for sure that there is only one "Producer" thread that is adding tasks to the queue and there is exactly one "Consumer" thread that is removing tasks from the queue. There are not multiple consumers.

Questions that can arise when you see the code:

1. Is this code thread safe? After all, a shared Queue object is being accessed by two different threads - one adding to it and the other removing from it. So, our standard knowledge about threads and synchronization says that we need to do some synchronization here.

2. How to achieve synchronization, if it is needed?
a. Queue is by default not thread safe. Can I make use of the Synchronized Queue collection instead?
b. Synchronization and mutual exclusion can also be achieved by obtaining a lock and making sure only one of the two threads can access the Queue object at any time.

3. With this set up (knowing that there is only one producer and only one consumer), do I really need the Synchronization??

The 3rd question took most of my time today, the logic behind that thinking was this:

Even if there are two threads T1 and T2, accessing EnqueueTask and DequeueTask respectively, at any point of time, there could be only one thread active and using up the CPU time slice alloted to it. Further, shouldn't it be the case that the "Enqueue" and "Dequeue" functions on the Queue are atomic? What that means is, thread T1 can get preempted by thread T2, either before the call to Enqueue method or after the call to Enqueue is over. T1 cannot be preempted while it is in the process of enqueuing the task. Fair enough?

I tried searching a lot on whether these operations defined on the Queue object are atomic or reentrant. I never really found any concrete, to the point explanation to it. I am still not 100% sure I am right, but after some thinking I arrived at the following conclusion based on what I read mostly on the Net:

1. Queue they say is non-thread-safe by default. This actually perhaps means that the operations defined on this object are not atomic or reentrant. That is, it is in fact possible that thread T1 can get preempted while the "Enqueue" function call is in progress by thread T2 that then tries to call the "Dequeue" function. In that case, since both the functions do work on the same object and memory location, some weird, wrong results might be obtained.

2. I found an implementation of a Synchronized version of the Queue class. Following is how the "Enqueue" method will be implemented in the synchronized Queue class, that wraps a non-synchronized Queue object:

public override void Enqueue()
{
lock (syncRoot)
{
queue.Enqueue();
}
}

This implementation makes it apparent that the synchronized version in turn is obtaining a "lock" on the sync root.

3. What the synchronized collection is going to guarantee is that no two threads can call the operations on the synchronized collection object simultaneosly. An interesting point to note is that, it will however, not guarantee thread safety during enumeration or when a sequeunce of these operations needs to be executed as part of a critical section. What does this mean?

Consider the case when there can be multiple consumers in our earlier example, multiple threads trying to dequeue the tasks from our Queue. In that situation, using the Synchronized version of the Queue will not really help. The code:

if(queue.Count > 0)
{
queue.Dequeue();
}


has to be executed atomically. Otherwise, two threads T1 and T2 can read a Count > 0 and try to Dequeue the tasks. In case there is only one task in the Queue, the thread that dequeues the task later will get an exception. This problem can only be solved by obtaining a lock before entering the critical section.

4. Should we use Synchronized collections?

C#, Java provide the Synchronized collections as helper classes. And developers will obviously get tempted to use them. But here are a few points to know and remember before or while using them:

a. A Synchronized collection DOES NOT solve all the synchronization problems for us. Point 3 above gives an example.

b. If your code has sequence of operations on the collection that are atomic (as shown in Point 3), or you need to enumerate the collection, it is better to not use the synchronized collection. If you do, you would be unnecessarily adding the overhead of the synchronization done internally by the Synchronized collection.

c. I read in one of the articles that locking the collection object ourselves can give better results that using the synchronized wrapper. Have not tried this out myself, so cannot really comment on this.

d. The Synchronized collection is after all a wrapper over the actuall collection object. With the wrapper methods in place, every call to any method on the object will incur an overhead of synchronization.

Any comments / clarifications / corrections in this blog are welcome!

2 comments:

Anand said...

umm - not too sure point #3 is correct (or I havent understood what you said).
There are degress of thread safety. For example, Vector in Java is "mostly" thread safe but not quite since there are state dependencies across methods.
You may want to also check out ThreadLocal in this context

Casper van Dijk said...

I don't agree with your conclusion.

1) In reply to 4b:
Every statement of synchronization adds overhead, that's not the reason not to use it. The reason not to use the enumerator is that a racecondition could happen. For example: Thread 1 gets object from collection (with 1 element) and thread 2 removes it, while still in the loop.

d) Again, you talking about overhead. The whole implementation of synchronization adds overhead, because the use of locking. It will be overhead when other threads aren´t in blocking state when requiring the lock.

To resolve the problem regarding shared data among threads is to make a threadsafe class yourself, by implementing IEnumerable and doing the locking yourself.

Or to clone the collection, so it's use will be private.

Or to use ISync, which is a bad idea, because it's accessible by any thread, which you might don't have control over and could result in deadlock.

By now, there are some new concurrent collections in .NET 4.0 framework. I know you posted this a while back, so for newcomers, use the new concurrent collections for a simpler solution.