开发者

Find all intersecting data, not just the unique values

开发者 https://www.devze.com 2022-12-19 07:59 出处:网络
I thought that I understood Intersect, but it turns out I was wrong. List<int> list1 = new List<int>() { 1, 2, 3, 2, 3};

I thought that I understood Intersect, but it turns out I was wrong.

 List<int> list1 = new List<int>() { 1, 2, 3, 2, 3};
 List<int> list2 = new List<int>() { 2, 3, 4, 3, 4};

 list1.Intersect(list2) =>      2,3

 //But what I want is:
 // =>  2,3,2,3,2,3,3

I can figure a way like:

 var intersected = list1.Intersect(list2);
 var list3 = new List<int>();
 list3.AddRange(list1.Where(I => intersected.Contains(I)));
 list3.AddRange(list2.Where(I => intersected.Contains(I)));

Is there a easier way in LINQ to achieve this?

I do need to state that I do not care in which order the results are given.

2,2,2,3,3,3,3 would also be perfectly OK.

Problem is that I am using this on a very large collection, So I need efficiency.

We are talking about Objects, not ints. The ints were just 开发者_Python百科for the easy example, but I realize this can make a difference.


Let's see if we can precisely characterize what you want. Correct me if I am wrong. You want: all elements of list 1, in order, that also appear in list 2, followed by all elements of list 2, in order, that also appear in list 1. Yes?

Seems straightforward.

return list1.Where(x=>list2.Contains(x))
     .Concat(list2.Where(y=>list1.Contains(y)))
     .ToList();

Note that this is not efficient for large lists. If the lists have a thousand items each then this does a couple million comparisons. If you're in that situation then you want to use a more efficient data structure for testing membership:

list1set = new HashSet(list1);
list2set = new HashSet(list2);

return list1.Where(x=>list2set.Contains(x))
     .Concat(list2.Where(y=>list1set.Contains(y)))
     .ToList();

which only does a couple thousand comparisons, but potentially uses more memory.


var set = new HashSet(list1.Intersect(list2));
return list1.Concat(list2).Where(i=>set.Contains(i));


Maybe this could help: https://gist.github.com/mladenb/b76bcbc4063f138289243fb06d099dda

The original Except/Intersect return a collection of unique items, even though their contract doesn't state so (e.g. the return value of those methods isn't a HashSet/Set, but rather IEnumerable), which is probably a result of a poor design decision. Instead, we can use more intuitive implementation, which returns as much of the same elements from the first enumeration as there are, not just a unique one (using Set.Contains).

Further more, mapping function was added in order to help intersect/except collections of different types.

If you don't need to intersect/except collections of different types, just inspect the source code of the Intersect/Except and change the part which iterates through the first enumeration to use Set.Contains instead of Set.Add/Set.Remove.


I don't believe this is possible with the built-in APIs. But you could use the following to get the result you're looking for.

IEnumerable<T> Intersect2<T>(this IEnumerable<T> left, IEnumerable<T> right) {
  var map = left.ToDictionary(x => x, y => false);
  foreach ( var item in right ) {
    if (map.ContainsKey(item) ) {
      map[item] = true;
    }
  }
  foreach ( var cur in left.Concat(right) ) {
    if ( map.ContainsKey(cur) ) {
      yield return cur;
    }
  }
}
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号