Tuesday, May 22, 2007

Synchronizing my thoughts on Synchronized collection

I admit I was not exactly aware of how Synchronized collections work and what it means exactly to get a Synchronized version of a non-thread-safe collection. It all started with the following code that uses a non-synchronized Queue collection of C# to maintain a queue of tasks:

Main Thread:

public void EnqueueTask()
{
// Create a new task t
// Enqueue it into the Queue
JobQueue.Enqueue(t);
}

Consumer Thread:

public void DequeueTask()
{
// Check if number of items in the queue is > 0 and dequeue the task if true
if(JobQueue.Count > 0)
{
JobQueue.Dequeue();
}
}

Note that it is known for sure that there is only one "Producer" thread that is adding tasks to the queue and there is exactly one "Consumer" thread that is removing tasks from the queue. There are not multiple consumers.

Questions that can arise when you see the code:

1. Is this code thread safe? After all, a shared Queue object is being accessed by two different threads - one adding to it and the other removing from it. So, our standard knowledge about threads and synchronization says that we need to do some synchronization here.

2. How to achieve synchronization, if it is needed?
a. Queue is by default not thread safe. Can I make use of the Synchronized Queue collection instead?
b. Synchronization and mutual exclusion can also be achieved by obtaining a lock and making sure only one of the two threads can access the Queue object at any time.

3. With this set up (knowing that there is only one producer and only one consumer), do I really need the Synchronization??

The 3rd question took most of my time today, the logic behind that thinking was this:

Even if there are two threads T1 and T2, accessing EnqueueTask and DequeueTask respectively, at any point of time, there could be only one thread active and using up the CPU time slice alloted to it. Further, shouldn't it be the case that the "Enqueue" and "Dequeue" functions on the Queue are atomic? What that means is, thread T1 can get preempted by thread T2, either before the call to Enqueue method or after the call to Enqueue is over. T1 cannot be preempted while it is in the process of enqueuing the task. Fair enough?

I tried searching a lot on whether these operations defined on the Queue object are atomic or reentrant. I never really found any concrete, to the point explanation to it. I am still not 100% sure I am right, but after some thinking I arrived at the following conclusion based on what I read mostly on the Net:

1. Queue they say is non-thread-safe by default. This actually perhaps means that the operations defined on this object are not atomic or reentrant. That is, it is in fact possible that thread T1 can get preempted while the "Enqueue" function call is in progress by thread T2 that then tries to call the "Dequeue" function. In that case, since both the functions do work on the same object and memory location, some weird, wrong results might be obtained.

2. I found an implementation of a Synchronized version of the Queue class. Following is how the "Enqueue" method will be implemented in the synchronized Queue class, that wraps a non-synchronized Queue object:

public override void Enqueue()
{
lock (syncRoot)
{
queue.Enqueue();
}
}

This implementation makes it apparent that the synchronized version in turn is obtaining a "lock" on the sync root.

3. What the synchronized collection is going to guarantee is that no two threads can call the operations on the synchronized collection object simultaneosly. An interesting point to note is that, it will however, not guarantee thread safety during enumeration or when a sequeunce of these operations needs to be executed as part of a critical section. What does this mean?

Consider the case when there can be multiple consumers in our earlier example, multiple threads trying to dequeue the tasks from our Queue. In that situation, using the Synchronized version of the Queue will not really help. The code:

if(queue.Count > 0)
{
queue.Dequeue();
}


has to be executed atomically. Otherwise, two threads T1 and T2 can read a Count > 0 and try to Dequeue the tasks. In case there is only one task in the Queue, the thread that dequeues the task later will get an exception. This problem can only be solved by obtaining a lock before entering the critical section.

4. Should we use Synchronized collections?

C#, Java provide the Synchronized collections as helper classes. And developers will obviously get tempted to use them. But here are a few points to know and remember before or while using them:

a. A Synchronized collection DOES NOT solve all the synchronization problems for us. Point 3 above gives an example.

b. If your code has sequence of operations on the collection that are atomic (as shown in Point 3), or you need to enumerate the collection, it is better to not use the synchronized collection. If you do, you would be unnecessarily adding the overhead of the synchronization done internally by the Synchronized collection.

c. I read in one of the articles that locking the collection object ourselves can give better results that using the synchronized wrapper. Have not tried this out myself, so cannot really comment on this.

d. The Synchronized collection is after all a wrapper over the actuall collection object. With the wrapper methods in place, every call to any method on the object will incur an overhead of synchronization.

Any comments / clarifications / corrections in this blog are welcome!

Friday, May 18, 2007

Weak References

By default, when we use a "new" operator, we get hold of a strong reference. A strong reference to an object prevents the Garbage Collector from removing the object from the Heap. The GC will not garbage collect any object as long as there are one or more strong references to it.

While this is the behavior we expect most of the times, there are times when we may not want to hold a strong reference to an object and prevent it from getting garbage collected.

A typical example of this is a Cache. While implementing a home grown Cache mechanism, it would be good to make use of what are called "Weak References". A weak reference allows the garbage collector to collect the object while still allowing the application to access the object. This means that in the intermediate time when there are no strong references to the object and till the time the garbage collector has not as yet collected the object from the heap, the object is accessible via the weak reference.

Example where Weak Reference can be used:

Lets say there is a huge DataSet that we are maintaining in memory. The DataSet is displayed to the user on one page. Now, the user visits another page of the application and therefore now the DataSet maintained in memory is not really required. Before the user moves to another page, what we can do is nullify the strong reference to the DataSet while maintaining only a weak reference to it. What this means is, while the user visits other pages of the application, if at all GC runs and needs to free the memory, the DataSet can be garbage collected. In case other pages do not need a lot of memory, GC will not run and the DataSet will not be garbage collected. If at all the user visits the same page again, the DataSet will still be in memory and can be referenced using the Weak Reference. Thus, a weak reference makes sure that we do not unnecessarily hold on to big objects in memory and prevent the GC from garbage collecting it.

The basic pattern for the use of a WeakReference would look like this:

// Create a weak reference to the DataSet ds.
private WeakReference Data = new WeakReference(ds);
// Get Data method that makes use of the Weak Reference
public DataSet GetData(){ DataSet data = (DataSet)Data.Target; if( data !=null) { return data; } else { data= GetBigDataSet() // load the data .... // Create a Weak Reference to data for later use... Data.Target = data; } return data;}
It would be clear by now how Weak References help when implementing a Cache. A cache can always maintain weak references to objects, which means that if memory needs to be cleaned up, objects in the cache that are no longer strong referenced can be picked up by GC freeing up the memory.

Note: While Weak Reference is a good thing for saving on memory utilization, weak references should be used only for large objects preferrably, because with small objects, the weak reference pointer itself could be larger than the actual object.

Sunday, May 13, 2007

Improving ASP .NET Performance - Part I

Read a very good, comprehensive article on "Improving ASP .NET Performance" on MSDN. Following are some important and interesting points mentioned in it, particularly concentrating on improving the performance of ASP .NET pages:

1. Trim your page size

Not something that is on our top priority list when we think about improving performance! But large pages increase the response times experienced by the client, increase the consumption of the n/w bandwidth, thereby increasing the load on the CPU. To trim the page size:

a. Remove extra white spaces and tabs (though good coding practice would tell you to keep them for better readability of the code)

b. Use script includes for static javascripts so that they can be cached for subsequent requests.

c. Disable view state when you do not need it

d. Limit the use of graphics, consider using compressed graphics

e. Use CSS for styling to avoid sending same formatting directives to the client repeatedly

f. Avoid long control names! (Again something that is contradictory to good coding pratice rules)

2. Use Page.IsPostBack check in the page code-behind to avoid execution of instructions that need to be executed only once when the page loads for the first time

3. Data Binding in pages:

a. Avoid using Page.DataBind method, since it is a page-level method. It internally invokes DataBind method on all the controls on the page that support data binding. Instead, as far as possible, call DataBind explicitly on required controls.

b. Minimize calls to DataBinder.Eval

DataBinder.Eval uses reflection to evaluate the arguments that are passed to it. This can be quite time consuming and expensive when there are a lot of rows and columns in the table. Instead, one can use explicit casting (cast to DataRowView class) or use ItemDataBound event when the record being bound contains a lot of fields.

4. Partial Page or Fragment Caching

In cases when caching of the entire page is not possible by using the OutputCache directive (this could be the case when parts of the page are dynamic and change frequently), it is possible to enable fragment caching only for specific portions of a page. These portions need to be abstracted out into user controls. The user controls on a page can be cached independently of the page. Typical examples are headers and footers, navigation menus etc.

5. If the same user control is repeated on multiple pages, make the pages share the same instance of the user control by setting the "Shared" attribute of @ OutputCache directive of the user control to true. This will save a significant amount of memory.

Saturday, May 12, 2007

.NET ThreadPool - Pitfalls and gotchas

Multithreading is used extensively in user interface applications, mainly to perform some time consuming operations in the background, while keeping the user interface active at the same time and not having to block the user. While multithreading is good, having too many threads active at a point in time can adversely affect the performance instead of improving it, just because of the number of expensive context switches that need to be performed.

A middle way therefore, is to make use of Thread Pools. .NET provides you with a readymade implementation of a Thread Pool in the form of the System.Threading.ThreadPool class. A single thread pool (default pool size of 25 threads) is maintained by the CLR for each process, asynchronous tasks can be performed by making use of the methods in this class, typically by calling the QueueUserWorkItem method that queues user requests to be picked up by available threads in the pool.

While the use of ThreadPool makes it a lot easier on the developer (all the intricacies of creating, managing and destroying a thread are hidden and happen behind the scenes) and also improves performance (a quick comparison between a manual Thread.start() and ThreadPool.QueueUserWorkItem()) shows a big difference), there are some pitfalls / points to remember / gotchas when it comes to using the ThreadPool. Following are some:

1. ThreadPool is leveraged by the .NET framework for a lot of tasks. ADO .NET, .NET Remoting, Timers, built-in delegate BeginInvoke methods - all of them internally make use of the ThreadPool. So this means, that the thread pool does not belong to your application alone, but is being used and loaded by the framework itself.

2. The tasks queued up using the QueueUserWorkItem can remain in a wait state for a long time, but the actual work required for each task has to be really less and fast - in order to avoid excessive blocking of a single thread to perform the task.

3. Once a task is submitted to the queue, there is no control over the thread that executes it, no way to get the state or set the thread's priority. It is not possible to create named threads using the ThreadPool class and therefore there is no way to track a particular thread. It is therefore best to use the ThreadPool only when you want to run independent tasks asynchronously, with no need to prioritize them, or make sure they run in a particular order.

4. One ThreadPool is created per process - which can possibly have multiple AppDomains. So, if one application using the ThreadPool behaves badly, another application in the same process runs the risk of getting affected!

5. It is critical to remember to write the code in such a way that deadlocks do not occur. While this is the very basic care one should take while using threads, it becomes pronounced with the use of ThreadPool because of point number 1 mentioned above. The catch is explained below:

Let us say there is a method called "ConnectTo" that opens and closes a socket using the "BeginConnect" and "EndConnect" methods of .NET that internally make use of the ThreadPool. There is a task "WriteToSocket" that is submitted to the queue - to make use of the ThreadPool. And now imagine there are 2 such tasks created with the pool size being 2. Now, the situation is that the two threads in the ThreadPool are already blocked by the "WriteToSocket" tasks. Each of these tasks, however, call "ConnectTo" which requires a thread from the ThreadPool in order to execute the asynchronous "BeginConnect" method. If you get the picture - what has happened in this case is the famous deadlock situtation.

Some rules of thumb to remember to avoid a situation as above:

a. Do not create any class whose synchronous methods wait for asynchronous functions, since this class could be called from a thread on the pool.

b. Do not use any class inside an asynchronous function if the class blocks waiting for asynchronous functions

c. Do not ever block a thread executed on the pool that is waiting for another function on the pool - so basically know which of the .NET built-in functions make use of the ThreadPool!

Friday, May 04, 2007

Showing a Sort Order Indicator in Header of GridView control of ASP .NET 2.0

It doesn't take a whole lot of effort to provide sorting on columns in a GridView control of ASP .NET 2.0. However, it does not have built-in support for showing an icon or an image to indicate the column on which the table is sorted and the order in which it is sorted.

To enable sorting and to show a sort order indicator in the column header of a GridView, the following things need to be done:

1. In the .aspx page, define event handlers for "OnSorting" and "OnRowCreated" events. The OnSorting event gets called whenever the grid is sorted by clicking on a column header and OnRowCreated event is called when a row in the grid gets created. Also set the "AllowSorting" attribute to true. The following code snippet shows the attributes of the GridView control:

AllowSorting="True" OnSorting="gridViewInvoiceSearchResult_OnSort" OnRowCreated="gridViewInvoiceSearchResult_OnRowCreated"

2. Write the event handlers for OnSorting and OnRowCreated events in the code-behind page.

3. The GridViewSortEventArgs parameter passed to the OnSorting event handler contains the sort direction (ascending or descending) and the sort expression (used to identify the column on which the sorting is done. A particular column in the GridView can be associated with a SortExpression while defining the binding of the column to a particular data field). Store these values in member variables declared within the code-behind page.

4. In the OnRowCreated event, use the SortExpression and SortDirection values stored earlier to determine which image to add and which column to add it to. The following code snippet shows the OnRowCreated event handler:

protected void gridViewInvoiceSearchResult_OnRowCreated(object sender, GridViewRowEventArgs e)
{
// Check whether the row is a header row

if (e.Row.RowType == DataControlRowType.Header)
{
// m_SortExp is the sort expression stored in the OnSorting event handler

if (String.Empty != m_SortExp)
{
// Based on the sort expression, find the index of the sorted column

int column = GetSortColumnIndex(this.gridViewInvoiceSearchResult, m_SortExp);
if (column != -1)
// Add an image to the sorted column header depending on the sort direction

AddSortImage(e.Row, column, m_SortDirection);
}
}
}

// Method to get the index of the sorted column based on SortExpression

private int GetSortColumnIndex(GridView gridView, String sortExpression)

{
if (gridView == null)
return -1;
foreach (DataControlField field in gridView.Columns)
{
if (field.SortExpression == sortExpression)
{
return gridView.Columns.IndexOf(field);
}
}
return -1;
}

// Method to add the sort icon to the column header

private void AddSortImage(GridViewRow headerRow, int column, SortDirection sortDirection)

{
if (-1 == column)
{
return;
}
// Create the sorting image based on the sort direction.
Image sortImage = new Image();
if (SortDirection.Ascending == sortDirection)
{
sortImage.ImageUrl = "~/down.gif";
sortImage.AlternateText = "Ascending order";
}
else
{
sortImage.ImageUrl = "~/up.gif";
sortImage.AlternateText = "Descending order";
}
// Add the image to the appropriate header cell.
headerRow.Cells[column].Controls.Add(sortImage);
}