开发者

LINQ Combine Queries

开发者 https://www.devze.com 2023-01-24 01:24 出处:网络
I have two collections of objects of different type.Lets call them type ALPHA and type BRAVO.Each of these types has a property that is the \"ID\" for the object.No ID is duplicated within the class,

I have two collections of objects of different type. Lets call them type ALPHA and type BRAVO. Each of these types has a property that is the "ID" for the object. No ID is duplicated within the class, so for any given ID, there is at most one ALPHA and one BRAVO instance. What I need to do is divide them into 3 categories:

  1. Instances of the ID in ALPHA which do not appear in the BRAVO collection;
  2. Instances of the ID in BRAVO which do not appear in the ALPHA collection;
  3. Instances of the ID which appear in both collections.

In all 3 cases, I need to have the actual objects from the collections at hand for subsequent manipulation.

I know for the #3 case, I can do something like:

 var myCorrelatedItems = myAlphaItems.Join(myBravoItems, alpha => alpha.Id, beta => beta.Id, (inner, outer) => new
            {
                alpha = inner,
                beta = outer
            });

I can also write code for the #1 and #2 cases which look something like

var myUnmatchedAlphas = myAlphaItems.Where(alpha=>!myBravoItems.Any(bravo=>alpha.Id==bravo.Id));

And similarly for unMatchedBravos. Unfortunately, this would result in iterating the collection of alphas (which may be very large!) many times, and the collection of bravos (which may also be very lar开发者_如何学运维ge!) many times as well.

Is there any way to unify these query concepts so as to minimize iteration over the lists? These collections can have thousands of items.


If you are only interested in the IDs,

var alphaIds = myAlphaItems.Select(alpha => alpha.ID);
var bravoIds = myBravoItems.Select(bravo => bravo.ID);

var alphaIdsNotInBravo = alphaIds.Except(bravoIds);
var bravoIdsNotInAlpha = bravoIds.Except(alphaIds);

If you want the alphas and bravos themselves,

var alphaIdsSet = new HashSet<int>(alphaIds);
var bravoIdsSet = new HashSet<int>(bravoIds);

var alphasNotInBravo = myAlphaItems
                       .Where(alpha => !bravoIdsSet.Contains(alpha.ID));

var bravosNotInAlpha = myBravoItems
                       .Where(bravo => !alphaIdsSet.Contains(bravo.ID));

EDIT: A few other options:

  1. The ExceptBy method from MoreLinq.
  2. The Enumerable.ToDictionary method.
  3. If both types inherit from a common type (e.g. an IHasId interface), you could write your own IEqualityComparer<T> implementation; Enumerable.Except has an overload that accepts an equality-comparer as a parameter.


Sometimes LINQ is not the answer. This is the kind of problem where I would consider using a HashSet<T> with a custom comparer to reduce the work of performing set operations. HashSets are much more efficient at performing set operations than lists - and (depending on the data) can reduce the work considerably:

// create a wrapper class that can accomodate either an Alpha or a Bravo
class ABItem { 
   public Object Instance   { get; private set; }
   public int Id            { get; private set; }
   public ABItem( Alpha a ) { Instance = a; Id = a.Id; }
   public ABItem( Bravo b ) { Instance = b; Id = b.Id; }
}

// comparer that compares Alphas and Bravos by id
class ABItemComparer : IComparer {
   public int Compare( object a, object b ) { 
       return GetId(a).Compare(GetId(b));
   }

   private int GetId( object x ) {
       if( x is Alpha ) return ((Alpha)x).Id;
       if( x is Bravo ) return ((Bravo)x).Id;
       throw new InvalidArgumentException();
   }
}

// create a comparer based on comparing the ID's of ABItems
var comparer = new ABComparer(); 

var hashAlphas = 
    new HashSet<ABItem>(myAlphaItems.Select(x => new ABItem(x)),comparer);

var hashBravos = 
    new HashSet<ABItem>(myBravoItems.Select(x => new ABItem(x)),comparer);

// items with common IDs in Alpha and Bravo sets:
var hashCommon = new HashSet<Alpha>(hashAlphas).IntersectWith( hashSetBravo );

hashSetAlpha.ExceptWith( hashSetCommon );  // items only in Alpha
hashSetBravo.ExceptWith( hashSetCommon );  // items only in Bravo


Dictionary<int, Alpha> alphaDictionary = myAlphaItems.ToDictionary(a => a.Id);
Dictionary<int, Bravo> bravoDictionary = myBravoItems.ToDictionary(b => b.Id);

ILookup<string, int> keyLookup = alphaDictionary.Keys
  .Union(bravoDictionary.Keys)
  .ToLookup(x => alphaDictionary.ContainsKey(x) ?
    (bravoDictionary.ContainsKey(x) ? "both" : "alpha") :
    "bravo");

List<Alpha> alphaBoth = keyLookup["both"].Select(x => alphaDictionary[x]).ToList();
List<Bravo> bravoBoth = keyLookup["both"].Select(x => bravoDictionary[x]).ToList();

List<Alpha> alphaOnly = keyLookup["alpha"].Select(x => alphaDictionary[x]).ToList();
List<Bravo> bravoOnly = keyLookup["bravo"].Select(x => bravoDictionary[x]).ToList();


Here is one possible LINQ solution that performs a full outer join on both sets and appends a property to them showing which group they belong to. This solution might lose its luster, however, when you try to separate the groups into different variables. It all really depends on what kind of actions you need to perform on these objects. At any rate this ran at (I thought) an acceptable speed (.5 seconds) for me on lists of 5000 items:

var q =
  from g in
  (from id in myAlphaItems.Select(a => a.ID).Union(myBravoItems.Select(b => b.ID))
  join a in myAlphaItems on id equals a.ID into ja
  from a in ja.DefaultIfEmpty()
  join b in myBravoItems on id equals b.ID into jb
  from b in jb.DefaultIfEmpty()
  select  (a == null ? 
            new { ID = b.ID, Group = "Bravo Only" } : 
            (b == null ? 
                new { ID = a.ID, Group = "Alpha Only" } : 
                new { ID = a.ID, Group = "Both" }
            )
        )
    )
  group g.ID by g.Group;

You can remove the 'group by' query or create a dictionary from this (q.ToDictionary(x => x.Key, x => x.Select(y => y))), or whatever! This is simply a way of categorizing your items. I'm sure there are better solutions out there, but this seemed like a truly interesting question so I thought I might as well give it a shot!


I think LINQ is not the best answer to this problem if you want to traverse and compare the minimum amount of times. I think the following iterative solution is more performant. And I believe that code readability doesn't suffer.

var dictUnmatchedAlphas = myAlphaItems.ToDictionary(a => a.Id);
var myCorrelatedItems = new List<AlphaAndBravo>();
var myUnmatchedBravos = new List<Bravo>();
foreach (Bravo b in myBravoItems)
{
    var id = b.Id;
    if (dictUnmatchedAlphas.ContainsKey(id))
    {
        var a = dictUnmatchedAlphas[id];
        dictUnmatchedAlphas.Remove(id); //to get just the unmatched alphas
        myCorrelatedItems.Add(new AlphaAndBravo { a = a, b = b});
    }
    else
    {
        myUnmatchedBravos.Add(b);
    }
}

Definition of AlphaAndBravo:

    public class AlphaAndBravo {
        public Alpha a { get; set; }
        public Bravo b { get; set; }
    } 
0

精彩评论

暂无评论...
验证码 换一张
取 消