Linq to Sql: Select distinct rows while ignoring specified columns_问答_开发者

Linq to Sql: Select distinct rows while ignoring specified columns

开发者 https://www.devze.com 2023-02-15 08:44 出处：网络

If I have a query/results set as follows... from t1 in table1 join t2 in table2 on t1.ID equals t2.ID join t3 in table3 on t2.ID equals t3.ID

If I have a query/results set as follows...

from t1 in table1
join t2 in table2 on t1.ID equals t2.ID
join t3 in table3 on t2.ID equals t3.ID
select new MyClass()
{
   ID = 开发者_开发问答t1.ID,
   Name = t2.Name,
   Rank = t2.Rank,
   City = t3.City
}

ID | Name | City | Rank
01 | Test | Fake | 876
01 | Test | Fake | 755
02 | Blah | Fake | 765

Doing a .Distinct() will return all 3 records, but what if I want the first and third record and don't want to eliminate any columns from my results set? Is there a way to specify columns to ignore when doing a distinct, or explicitly include in a distinct so that I could do something like this...

// pseudo code
.Distinct(o => o.Name.FirstOrDefault())

I think it's possible to do this using group by, but that seems like it would be slow and messy, especially if I needed to group on multiple columns in order to not exclude too many rows. Any ideas? Thanks.

No, you can't specify a projection by which to determine equality for Distinct. (I have extension methods to do this for LINQ to Objects, but that's a different matter.)

I suggest you use GroupBy, possibly in conjunction with First:

var query = from t1 in table1
            join t2 in table2 on t1.ID equals t2.ID
            join t3 in table3 on t2.ID equals t3.ID
            select new { t1.ID, t2.Name, t2.Rank, t3.City } into tuple
            group tuple by new { tuple.ID, tuple.Name, tuple.City } into grouping
            let first = grouping.First() // For convenience
            select new MyClass
            {
                ID = first.ID, City = first.City,
                Name = first.Name, Rank = first.Rank
            };

Note that the results of this may not be deterministic - you're taking the first item of each group, ignoring the rank - so you may get the record with rank 876 or you may get the one with rank 755 in your example.

I don't know exactly what this will look like in SQL - but you should check via logging, and then run the query profiler to see what the cost will be like.