HashSet The C# HashSet data structure was introduced in the .NET Framework 3.5. A full list of the implemented members can be found at the HashSet MSDN page.
- 开发者_如何学编程
- Where is it used?
- Why would you want to use it?
-
- A
HashSet
holds a set of objects, but in a way that allows you to easily and quickly determine whether an object is already in the set or not. It does so by internally managing an array and storing the object using an index which is calculated from the hashcode of the object. Take a look here
- A
HashSet
is an unordered collection containing unique elements. It has the standard collection operations Add, Remove, Contains, but since it uses a hash-based implementation, these operations are O(1). (As opposed to List for example, which is O(n) for Contains and Remove.)HashSet
also provides standard set operations such as union, intersection, and symmetric difference. Take a look hereThere are different implementations of Sets. Some make insertion and lookup operations super fast by hashing elements. However, that means that the order in which the elements were added is lost. Other implementations preserve the added order at the cost of slower running times.
The HashSet
class in C# goes for the first approach, thus not preserving the order of elements. It is much faster than a regular List
. Some basic benchmarks showed that HashSet is decently faster when dealing with primary types (int, double, bool, etc.). It is a lot faster when working with class objects. So the point is that HashSet is fast.
The only catch of HashSet
is that there is no access by indices. To access elements you can either use an enumerator or use the built-in function to convert the HashSet
into a List
and iterate through that. Take a look here
A HashSet
has an internal structure (hash), where items can be searched and identified quickly. The downside is that iterating through a HashSet
(or getting an item by index) is rather slow.
So why would someone want be able to know if an entry already exists in a set?
One situation where a HashSet
is useful is in getting distinct values from a list where duplicates may exist. Once an item is added to the HashSet
it is quick to determine if the item exists (Contains
operator).
Other advantages of the HashSet
are the Set operations: IntersectWith
, IsSubsetOf
, IsSupersetOf
, Overlaps
, SymmetricExceptWith
, UnionWith
.
If you are familiar with the object constraint language then you will identify these set operations. You will also see that it is one step closer to an implementation of executable UML.
Simply said and without revealing the kitchen secrets:
a set in general, is a collection that contains no duplicate elements, and whose elements are in no particular order. So, A HashSet<T>
is similar to a generic List<T>
, but is optimized for fast lookups (via hashtables, as the name implies) at the cost of losing order.
From application perspective, if one needs only to avoid duplicates then HashSet
is what you are looking for since it's Lookup, Insert and Remove complexities are O(1) - constant. What this means it does not matter how many elements HashSet
has it will take same amount of time to check if there's such element or not, plus since you are inserting elements at O(1) too it makes it perfect for this sort of thing.
精彩评论