开发者

Mark non-unique rows in a DataTable

开发者 https://www.devze.com 2023-01-28 08:30 出处:网络
I have a DataTable which I want to check if values in three of the columns are unique. If not, the last column should be filled with the line number of the first appearance of the value-combination.

I have a DataTable which I want to check if values in three of the columns are unique. If not, the last column should be filled with the line number of the first appearance of the value-combination.

For example, this table:

ID    Name    LastName    Age    Flag
-------------------------------------
1     Bart    Simpson     10      -
2     Lisa    Simpson      8      -
3     Bart    Simpson     10      -
4     Ned     Flanders    40      -
5     Bart    Simpson     10      -

Should lead to this result:

Line  Name    LastName    Age    Flag
-------------------------------------
1     Bart    Simpson     10      -
2     Lisa    Simpson      8      -
3     Bart    Simpson     10      1
4     Ned     Flanders    40      -
5     Bart    Simpso开发者_StackOverflown     10      1

I solved this by iterating the DataTable with two nested for loops and comparing the values. While this works fine for a small amount of data, it gets pretty slow when the DataTable contains a lot of rows.

My question is: What is the best/fastest solution for this problem, regarding that the amount of data can vary between let's say 100 and 20000 rows?

Is there a way to do this using LINQ? (I'm not too familiar with it, but I want to learn!)


I can't comment on how you might do this in C#/VB with a data table, but if you could move it all to SQL, your query would look like:

declare @t table (ID int, Name varchar(10), LastName varchar(10), Age int)
insert into @t values (1,     'Bart' ,   'Simpson',     10 )
insert into @t values (2,     'Lisa',    'Simpson' ,     8 )
insert into @t values (3,     'Bart',    'Simpson' ,    10 )
insert into @t values (4,     'Ned',     'Flanders' ,   40 )
insert into @t values (5 ,    'Bart',    'Simpson'   ,  10 )

select t.*,
(select min(ID) as ID
    from @t t2
    where t2.Name = t.Name
    and t2.LastName = t.LastName
    and t2.id < t.id)
from @t t

Here I've defined a table for demo purposes. I suppose you might be able to translate this into LINQ.


Okay, I think I got an answer myself. Based on the suggestion in James Wiseman's answer, I tried something with LINQ.

Dim myErrnrFnct = Function( current, first) If(first <> current, first, 0)
Dim myQuery = From row As DataRow In myDt.AsEnumerable _
                      Select New With { _
                        .LINE = row.Item("LINE"), _
                        .NAME = row.Item("NAME"), _
                        .LASTNAME = row.Item("LASTNAME"), _
                        .AGE = row.Item("AGE"), _
                        .FLAG = myErrnrFnct(row.Item("LINE"), myDt.AsEnumerable.First(Function(rowToCheck) _
                                                                                        rowToCheck.Item("NAME") = row.Item("NAME") AndAlso _
                                                                                        rowToCheck.Item("LASTNAME") = row.Item("LASTNAME") AndAlso _
                                                                                        rowToCheck.Item("AGE") = row.Item("AGE")).Item("LINE")) _
                      }

With this query I get exactly the result that's described in the Question. The myErrnrFnct Function is necessary because I want the Flag column to have the value 0 if there is no other row with the same values.

To get a DataTable out of myQuery again, I had to add some extensions described here:
How to: Implement CopyToDataTable Where the Generic Type T Is Not a DataRow
And then, this line will do:

Dim myNewDt As DataTable = myQuery.CopyToDataTable()

This seems to work just fine. Any suggestions to do this better?

0

精彩评论

暂无评论...
验证码 换一张
取 消