Ever since I began to code for .NET I’ve been confused by the host of collections available in the .NET Framework. On the one hand, there are so many of them in the System.Collections and System.Collections.Specialized namespaces. On the other hand, it’s a pity there’s so little “prescriptive guidance” as to when to use which collection and why.
The other day, while reading Improving .NET Application Performance and Scalability, I came across an interesting discussion of collections. For some reason I got stuck on the explanations. I spent two hours analyzing 10 measly pages on the train, a total of another 2 hours poking around various collections with Reflector, and then another hour re-reading the section and comparing it with MSDN. I simply couldn’t let it go because I realized I was completely confused.
The confusion stems from inconsistent naming of collections, strange design of some of them and misleading advice on MSDN. I’d like to share my perspective on .NET collections. Please, feel free to state your opinion, observations and experiences.
Terminology
I think the legs of confusion grow from terminology. In .NET collections are referred to as collections, lists, dictionaries. These terms are—at times—used interchangeably.
For example: SortedList. By definition, it represents a collection of key-and-value pairs that are sorted by the keys and are accessible by key and by index
. You add a pair which makes it a dictionary, not a list. Internally it maintains an array of keys and an array of values, but on the outside it’s a dictionary. I’ll get back to this point later.
Another good example is the NameValueCollection class. MSDN states that it represents a sorted collection of associated String keys and String values that can be accessed either with the key or with the index
. I don’t believe this collection has anything to do with sorting, and I guess I’m not alone in this.
I think it would’ve made more sense if collections were defined as follows:
- Collection: any data container in general, be it a list or a dictionary.
- List: a collection of objects that can be individually accessed by index, exactly as the IList interface is defined.
- Dictionary: a collection of key-and-value pairs, exactly as the IDictionary interface is defined.
Even though definitions of ICollection, IList and IDictionary are accurate, naming of classes and their functionality aren’t. As I pointed out, SortedList is out of place.
Let’s review some of the collections in common use.
ArrayList
This one is a universal soldier. It’s mission is to store items of type Object in an internal list, which means you can store anything in it. You can even add null. As elements are added, it resizes itself. Initial capacity defaults to 16.
Duplicates
Yes, tolerates duplicates.
Lookup by index
Very fast. All it does it pull out an element from an internal array by position.
Lookup by value
IndexOf performs a linear search, therefore, the average execution time is proportional to Count which is slow.
Contains performs a linear search, therefore, the average execution time is proportional to Count which is slow, too. Same definition as above, different internal implementation.
BinarySearch is faster and is recommended for efficient searches. Interestingly enough, BinarySearch can conduct case-insensitive searches if you pass it CaseInsensitiveComparer:
// Culture-agnostic search
list.BinarySearch (CaseInsensitiveComparer.DefaultInvariant)
// Culture-specific search
CaseInsensitiveComparer.Default
Here’s something that bit me today: if you pass an instance of IComparer to BinarySearch you need to sort the collection first, as MSDN points out:
If comparer is provided, the elements of the ArrayList are compared to the specified value using the specified IComparer implementation. If the ArrayList is not already sorted according to the sort order defined by comparer, the result might be incorrect.
The above sample code should look like this:
// Culture-agnostic search
list.Sort (CaseInsensitiveComparer.DefaultInvariant);
list.BinarySearch ("something",
CaseInsensitiveComparer.DefaultInvariant)
// Culture-specific search
list.Sort (list.BinarySearch);
list.BinarySearch ("something",
CaseInsensitiveComparer.Default);
Sorting
Sorting is available via the Sort method. Remember the little gotcha with ICompare (see above). Frequent sorting is expensive. It’s recommended that you make several changes in a batch and then resort the collection.
Implications
If the number of elements in the list reaches its capacity, it doubles the capacity, allocates a bigger buffer and copies all elements to it. Therefore, try to preallocate to the desired size when you know it in advance to avoid reallocations and copying.
Another trouble spot is boxing. Every time you store a value type in an ArrayList it’s boxed. If you don’t know what boxing is you really should. Read Eric Gunnerson’s Nice box. What’s in it? and Open the box! Quick! MSDN has a neat example of how to build a strongly typed collection and avoid boxing. It’s ironic that you have to build one yourself.
There’s so much to say about each collection, but I have to stop somewhere. Let’s move on and review ArrayList’s close relative StringCollection.
StringCollection
The mission of this class is to manipulate a list of string. Improving .NET Application Performance and Scalability calls it a strongly typed ArrayList.
The class used ArrayList as its element storage, therefore it closely resembles ArrayList: it’s flexible to frequent changes (beware of memory relocations, though), not so fast on sorting, provides a fast lookup by index, slow lookup by value (same ol’ IndexOf and Contains).
Ironically, StringCollection has no analoge of a fast search BinarySearch-style! Both IndexOf and Contains use slow linear search. This gaping hole was somewhat shocking to me. It pretty much renders this class useless. You design a class to manipulate lists of strings, simply delegate it to ArrayList and cut back on features!
I’m sure class library designers had something in mind, and I wonder what it was. Perhaps it prevents typecasting of elements, but the tradeoff is not fair.
Exotic Collections
Quite honestly, I don’t want to cover exotic collections, such as BitArray, Queue, Stack.
Custom Collections
To learn how to develop your own collections which avoid type casting overhead see Walkthrough: Creating Your Own Collection Class.
To Be Continued
In my second part of .NET Collection Madness I’ll review some of the dictionary-type classes, such as Hashtable, ListDictionary, HybridDictionary, etc.