I'm a complete LINQ newbie, so I don't know if my LINQ is incorrect for what I need to do or if my expectations of performance are too high.
I've got a SortedList of objects, keyed by int; SortedList as opposed to SortedDictionary because I'll be populating the collection with pre-sorted data. My task is to find either the exact key or, if there is no exact key, the one with the next开发者_StackOverflow higher value. If the search is too high for the list (e.g. highest key is 100, but search for 105), return null.
// The structure of this class is unimportant. Just using
// it as an illustration.
public class CX
{
public int KEY;
public DateTime DT;
}
static CX getItem(int i, SortedList<int, CX> list)
{
var items =
(from kv in list
where kv.Key >= i
select kv.Key);
if (items.Any())
{
return list[items.Min()];
}
return null;
}
Given a list of 50,000 records, calling getItem 500 times takes about a second and a half. Calling it 50,000 times takes over 2 minutes. This performance seems very poor. Is my LINQ bad? Am I expecting too much? Should I be rolling my own binary search function?
First, your query is being evaluated twice (once for Any
, and once for Min
). Second, Min
requires that it iterate over the entire list, even though the fact that it's sorted means that the first item will be the minimum. You should be able to change this:
if (items.Any())
{
return list[items.Min()];
}
To this:
var default =
(from kv in list
where kv.Key >= i
select (int?)kv.Key).FirstOrDefault();
if(default != null) return list[default.Value];
return null;
UPDATE
Because you're selecting a value type, FirstOrDefault
doesn't return a nullable object. I have altered your query to cast the selected value to an int?
instead, allowing the resulting value to be checked for null
. I would advocate this over using ContainsKey
, as that would return true
if your list contained a value for 0
. For example, say you have the following values
0 2 4 6 8
If you were to pass in anything less than or equal to 8, then you would get the correct value. However, if you were to pass in 9, you would get 0 (default(int)
), which is in the list but isn't a valid result.
Writing a binary search on your own can be tough.
Fortunately, Microsoft already wrote a pretty robust one: Array.BinarySearch<T>
. This is, in fact, the method that SortedList<TKey, TValue>.IndexOfKey
uses internally. Only problem is, it takes a T[]
argument, instead of any IList<T>
(like SortedList<TKey, TValue>.Keys
).
You know what, though? There's this great tool called Reflector that lets you look at .NET source code...
Check it out: a generic BinarySearch
extension method on IList<T>
, taken straight from the reflected code of Microsoft's Array.BinarySearch<T>
implementation.
public static int BinarySearch<T>(this IList<T> list, int index, int length, T value, IComparer<T> comparer) {
if (list == null)
throw new ArgumentNullException("list");
else if (index < 0 || length < 0)
throw new ArgumentOutOfRangeException((index < 0) ? "index" : "length");
else if (list.Count - index < length)
throw new ArgumentException();
int lower = index;
int upper = (index + length) - 1;
while (lower <= upper) {
int adjustedIndex = lower + ((upper - lower) >> 1);
int comparison = comparer.Compare(list[adjustedIndex], value);
if (comparison == 0)
return adjustedIndex;
else if (comparison < 0)
lower = adjustedIndex + 1;
else
upper = adjustedIndex - 1;
}
return ~lower;
}
public static int BinarySearch<T>(this IList<T> list, T value, IComparer<T> comparer) {
return list.BinarySearch(0, list.Count, value, comparer);
}
public static int BinarySearch<T>(this IList<T> list, T value) where T : IComparable<T> {
return list.BinarySearch(value, Comparer<T>.Default);
}
This will let you call list.Keys.BinarySearch
and get the negative bitwise complement of the index you want in case the desired key isn't found (the below is taken basically straight from tzaman's answer):
int index = list.Keys.BinarySearch(i);
if (index < 0)
index = ~index;
var item = index < list.Count ? list[list.Keys[index]] : null;
return item;
Using LINQ on a SortedList
will not give you the benefit of the sort.
For optimal performance, you should write your own binary search.
OK, just to give this a little more visibility - here's a more concise version of Adam Robinson's answer:
return list.FirstOrDefault(kv => kv.Key >= i).Value;
The FirstOrDefault
function has an overload that accepts a predicate, which selects the first element satisfying a condition - you can use that to directly get the element you want, or null
if it doesn't exist.
Why not use the BinarySearch
that's built into the List
class?
var keys = list.Keys.ToList();
int index = keys.BinarySearch(i);
if (index < 0)
index = ~index;
var item = index < keys.Count ? list[keys[index]] : null;
return item;
If the search target isn't in the list, BinarySearch
returns the bit-wise complement of the next-higher item; we can use that to directly get you what you want by re-complementing the result if it's negative. If it becomes equal to the Count
, your search key was bigger than anything in the list.
This should be much faster than doing a LINQ
As comments have pointed out, the where
, since it's already sorted...ToList
call will force an evaluation of the whole list, so this is only beneficial if you do multiple searches without altering the underlying SortedList
, and you keep the keys
list around separately.
Using OrderedDictionary in PowerCollections you can get an enumerator that starts where they key you are looking for should be... if it's not there, you'll get the next closest node and can then navigate forwards/backwards from that in O(log N) time per nav call.
This has the advantage of you not having to write your own search or even manage your own searches on top of a SortedList.
精彩评论