I'm pretty happy with the following method. It takes an enumerable and a list of sorted, disjoint ranges and skips items not in the ranges. If the ranges are null, we just walk every item. The enumerable and the list of ranges are both possibly large. We want this method to be as high performance as possible.
Can someone think of a more elegant piece of code? I'm primarily interested in C# implementations, but if someone has a three-character APL implementation, that's cool too.
public static IEnumerable<T> WalkRanges<T>(IEnumerable<T> source, List<Pair<int, int>> ranges)
{
Debug.Assert(ranges == null || ranges.Count > 0);
int currentItem = 0;
Pair<int, int> currentRange = new Pair<int, int>();
int currentRangeIndex = -1;
bool betweenRanges = false;
if (ranges != null)
{
currentRange = ranges[0];
currentRangeIndex = 0;
betweenRanges = currentRange.First > 0;
}
foreach (T item in source)
{
开发者_运维百科 if (ranges != null) {
if (betweenRanges) {
if (currentItem == currentRange.First)
betweenRanges = false;
else {
currentItem++;
continue;
}
}
}
yield return item;
if (ranges != null) {
if (currentItem == currentRange.Second) {
if (currentRangeIndex == ranges.Count - 1)
break; // We just visited the last item in the ranges
currentRangeIndex = currentRangeIndex + 1;
currentRange = ranges[currentRangeIndex];
betweenRanges = true;
}
}
currentItem++;
}
}
Maybe use linq on your source
something like:
public static IEnumerable<T> WalkRanges<T>(IEnumerable<T> source, List<Pair<int, int>> ranges)
{
if(ranges == null)
return null;
return source.Where((item, index) => ranges.Any(y => y.First < index && y.Second > index)).AsEnumerable();
}
I don't have my Windows PC in front of me and I'm not sure I understood your code correctly, but I tried to understand your text instead and the code above could work.... or something like it.
UPDATED: Regarding the performance issue I would recommend you to test the performance with some simple test and time both of the functions.
You could copy the source list to an array and then for each range, you can block copy from your new source array to a target array in the proper location. If you can get your source collection passed in as an array, that would make this an even better approach. If you do have to do the initial copy, it is O(N) for that operation plus O(M) where M is the total number of items in the final array. So it ends up coming out to O(N) in either case.
Here's my take. I find it easier to understand, if not more elegant.
public static IEnumerable<T> WalkRanges<T>(IEnumerable<T> source, List<Tuple<int, int>> ranges)
{
if (ranges == null)
return source;
Debug.Assert(ranges.Count > 0);
return WalkRangesInternal(source, ranges);
}
static IEnumerable<T> WalkRangesInternal<T>(IEnumerable<T> source, List<Tuple<int, int>> ranges)
{
int currentItem = 0;
var rangeEnum = ranges.GetEnumerator();
bool moreData = rangeEnum.MoveNext();
using (var sourceEnum = source.GetEnumerator())
while (moreData)
{
// skip over every item in the gap between ranges
while (currentItem < rangeEnum.Current.Item1
&& (moreData = sourceEnum.MoveNext()))
currentItem++;
// yield all the elements in the range
while (currentItem <= rangeEnum.Current.Item2
&& (moreData = sourceEnum.MoveNext()))
{
yield return sourceEnum.Current;
currentItem++;
}
// advance to the next range
moreData = rangeEnum.MoveNext();
}
}
How about this (untested)? Should have pretty similar performance characteristics (pure streaming, no unnecessary buffering, quick exit), but is easier to follow, IMO:
public static IEnumerable<T> WalkRanges<T>(IEnumerable<T> source,
List<Pair<int, int>> ranges)
{
if (source == null)
throw new ArgumentNullException("source");
// If ranges is null, just return the source. From spec.
return ranges == null ? source : RangeIterate(source, ranges);
}
private static IEnumerable<T> RangeIterate<T>(IEnumerable<T> source,
List<Pair<int, int>> ranges)
{
// The key bit: a lazy sequence of all valid indices belonging to
// each range. No buffering.
var validIndices = from range in ranges
let start = Math.Max(0, range.First)
from validIndex in Enumerable.Range(start, range.Second - start + 1)
select validIndex;
int currentIndex = -1;
using (var indexErator = validIndices.GetEnumerator())
{
// Optimization: Get out early if there are no ranges.
if (!indexErator.MoveNext())
yield break;
foreach (var item in source)
{
if (++currentIndex == indexErator.Current)
{
// Valid index, yield.
yield return item;
// Move to the next valid index.
// Optimization: get out early if there aren't any more.
if (!indexErator.MoveNext())
yield break;
}
}
}
}
If you don't mind buffering indices, you can do something like this, which is even more clearer, IMO:
public static IEnumerable<T> WalkRanges<T>(IEnumerable<T> source,
List<Pair<int, int>> ranges)
{
if (source == null)
throw new ArgumentNullException("source");
if (ranges == null)
return source;
// Optimization: Get out early if there are no ranges.
if (!ranges.Any())
return Enumerable.Empty<T>();
var validIndices = from range in ranges
let start = Math.Max(0, range.First)
from validIndex in Enumerable.Range(start, range.Second - start + 1)
select validIndex;
// Buffer the valid indices into a set.
var validIndicesSet = new HashSet<int>(validIndices);
// Optimization: don't take an item beyond the last index of the last range.
return source.Take(ranges.Last().Second + 1)
.Where((item, index) => validIndicesSet.Contains(index));
}
You could iterate over the collection manually to prevent the enumerator from getting the current item when it will be skipped:
public static IEnumerable<T> WalkRanges<T>(IEnumerable<T> source, List<Pair<int, int>> ranges)
{
Debug.Assert(ranges == null || ranges.Count > 0);
int currentItem = 0;
Pair<int, int> currentRange = new Pair<int, int>();
int currentRangeIndex = -1;
bool betweenRanges = false;
if (ranges != null)
{
currentRange = ranges[0];
currentRangeIndex = 0;
betweenRanges = currentRange.First > 0;
}
using (IEnumerator<T> enumerator = source.GetEnumerator())
{
while (enumerator.MoveNext())
{
if (ranges != null)
{
if (betweenRanges)
{
if (currentItem == currentRange.First)
betweenRanges = false;
else
{
currentItem++;
continue;
}
}
}
yield return enumerator.Current;
if (ranges != null)
{
if (currentItem == currentRange.Second)
{
if (currentRangeIndex == ranges.Count - 1)
break; // We just visited the last item in the ranges
currentRangeIndex = currentRangeIndex + 1;
currentRange = ranges[currentRangeIndex];
betweenRanges = true;
}
}
currentItem++;
}
}
}
My second try, this will consider the ordering of the ranges. I haven' tried it yet but I thinkt it works :). You could probably extract some of the code to smaller functions to make it more readable.
public static IEnumerable<T> WalkRanges<T>(IEnumerable<T> source, List<Pair<int, int>> ranges)
{
int currentIndex = 0;
int currentRangeIndex = 0;
int maxRangeIndex = ranges.Length;
bool done = false;
foreach(var item in source)
{
if(currentIndex > range[currentRangeIndex].Second)
{
while(currentIndex > range[currentRangeIndex].Second)
{
if(!++currentRangeIndex < maxRangeIndex)
{
// We've passed last range =>
// set done = true to break outer loop and then break
done = true;
break;
}
}
if(currentIndex > range[currentRangeIndex].First)
yield item; // include if larger than first since we now it's smaller than second
}
else if(currentIndex > range[currentRangeIndex].First)
{
// If higher than first and lower than second we're in range
yield item;
}
if(done) // if done break outer loop
break;
currentIndex++; // always increase index when advancint through source
}
}
精彩评论