I'm converting some C++ code to C# and it calls std::map::lower_bound(k) to find an entry in the map whose key is equal to or greater than k. However, I don't see any way to do the same thing with .NET's SortedDictionary. I suspect I could implement a workaround using SortedList, but unfortunately SortedList is too slow (O(n) for inserting and deleting k开发者_StackOverfloweys). What can I do?
Note: I found a workaround using that takes advantage of my particular scenario... Specifically, my keys are a dense population of integers starting at just over 0, so I used a List<TValue> as my dictionary with the list index serving as the key, and searching for a key equal or greater than k can be done in only a few loop iterations. But it would still be nice to see the original question answered.
I have developed several collection classes that support "find next higher key" and "find next lower key" operations.
First I made a set of Compact Patricia Trie collections. These are sets/dictionaries designed to minimize memory usage; they are especially efficient for large collections of URLs and certain other kinds of data. They only partly solve the problem, because only certain kinds of keys are supported, namely byte[]
, string
, and all primitive integer types (Int8..UInt64). Also, string sorting is case-sensitive. NuGet package: Loyc.Utilities
After publishing this answer, I made more sorted data structures that solve this problem generally: BList<T>
, BDictionary<K,V>
, BMultiMap<K,V>
and SparseAList<T>
. See my second answer for details.
The problem is that a dictionary/hash table is designed to arrive at a unique memory location based on an input value, so you'll need a data structure that is designed to accommodate a range related to each value you store, and at the same time update each interval correctly
I think skip lists (or balanced binary trees) can help you. Although they cannot perform lookups in O(n), they can do logarithmically and still faster than trees.
I know this is not a proper answer since I cannot say that the .NET BCL already contains such a class, you'll unfortunately have to implement one yourself, or find a 3rd party assembly that supports it for you. There seems to be a nice example over at The CodeProject here, though.
You can try the code i wrote below. it using binary search, therefore assuming the list/array is pre-sorted.
public static class ListExtensions
{
public static int GetAtMostIndex<TItem, TValue>(/*this*/ IList<TItem> list, TValue value, Func<TItem, TValue, int> comparer)
{
return GetAtMostIndex(list, value, comparer, 0, list.Count);
}
public static int GetAtLeastIndex<TItem, TValue>(/*this*/ IList<TItem> list, TValue value, Func<TItem, TValue, int> comparer)
{
return GetAtLeastIndex(list, value, comparer, 0, list.Count);
}
public static int GetAtMostIndex<TItem, TValue>(/*this*/ IList<TItem> list, TValue value, Func<TItem, TValue, int> comparer, int index, int count)
{
if (count == 0)
{
return -1;
}
int startIndex = index;
int endIndex = index + count - 1;
int middleIndex = 0;
int compareResult = -1;
while (startIndex < endIndex)
{
middleIndex = (startIndex + endIndex) >> 1; // / 2
compareResult = comparer.Invoke(list[middleIndex], value);
if (compareResult > 0)
{
endIndex = middleIndex - 1;
}
else if (compareResult < 0)
{
startIndex = middleIndex + 1;
}
else
{
return middleIndex;
}
}
if (startIndex == endIndex)
{
compareResult = comparer.Invoke(list[startIndex], value);
if (compareResult <= 0)
{
return startIndex;
}
else
{
int returnIndex = startIndex - 1;
if (returnIndex < index)
{
return -1;
}
else
{
return returnIndex;
}
}
}
else
{
//todo: test
return startIndex - 1;
}
}
public static int GetAtLeastIndex<TItem, TValue>(/*this*/ IList<TItem> list, TValue value, Func<TItem, TValue, int> comparer, int index, int count)
{
if (count == 0)
{
return -1;
}
int startIndex = index;
int endIndex = index + count - 1;
int middleIndex = 0;
int compareResult = -1;
while (startIndex < endIndex)
{
middleIndex = (startIndex + endIndex) >> 1; // / 2
compareResult = comparer.Invoke(list[middleIndex], value);
if (compareResult > 0)
{
endIndex = middleIndex - 1;
}
else if (compareResult < 0)
{
startIndex = middleIndex + 1;
}
else
{
return middleIndex;
}
}
if (startIndex == endIndex)
{
compareResult = comparer.Invoke(list[startIndex], value);
if (compareResult >= 0)
{
return startIndex;
}
else
{
int returnIndex = startIndex + 1;
if (returnIndex >= index + count)
{
return -1;
}
else
{
return returnIndex;
}
}
}
else
{
return endIndex + 1;
}
}
}
I created several data structures that provide this functionality for any data type: BList<T>
(a sorted list), BDictionary<K,V>
(a dictionary whose items are sorted by key), and BMultiMap<K,V>
(a dictionary in which more than one value can be associated with a key). See this article for details. Each of these data structures provide FindLowerBound()
and FindUpperBound()
methods that work like C++'s lower_bound
and upper_bound
. Internally, these collections are similar to B+ trees, so they have good performance and low memory usage; BDictionary<,>
typically uses about 44% less memory than a standard SortedDictionary<,>
(which in turn uses, on average, slightly less memory than Dictionary<,>
), assuming 64-bit keys and 64-bit values.
I also made a "sparse" collection, SparseAList<T>
, which is similar to BDictionary<int,T>
except that you can insert and remove "empty space" anywhere in the collection (empty space does not consume any memory). See this article for details.
All of these collections are in the Loyc.Collections NuGet package.
find nearest to K:
dict.Keys.Where(i => i >= K).OrderBy(i => i).First();
or much faster:
public int? GetNearestKey(dict, K)
{
int? lowerK = null;
foreach (int key in dict.Keys)
{
if (key == K)
{
lowerK = K;
break;
}
else if (key >= K && (!lowerK.HasValue || key < lowerK))
{
lowerK = key;
}
}
return lowerK;
}
There isn't a binary search tree collection implementation in the base framework, so you'll either have to build one or find an implementation. As you noted, SortedList is closest in terms of searching but is slower (due to its underlying array implementation) for insertion/deletion.
I think there's a mistake in the question about SortedList complexity.
SortedList has O(log(n)) amortized complexity for inserting new item. If you know in advance the capacity it can be done in O(Log(n)) in the worst case.
You can do this for SortedSet<T>
with following extension methods:
public static class SortedSetExtensions
{
public static bool FindLowerOrEqualThan<T>(this SortedSet<T> set, T value, out T first)
{
if(set.Count == 0)
{
first = default(T);
return false;
}
var minimum = set.Min;
if(set.Comparer.Compare(minimum, value) > 0)
{
first = default(T);
return false;
}
first = set.GetViewBetween(minimum, value).Max;
return true;
}
public static bool FindGreaterOrEqualThan<T>(this SortedSet<T> set, T value, out T first)
{
if (set.Count == 0)
{
first = default(T);
return false;
}
var maximum = set.Max;
if (set.Comparer.Compare(maximum, value) < 0)
{
first = default(T);
return false;
}
first = set.GetViewBetween(value, maximum).Min;
return true;
}
}
精彩评论