开发者

how to sort multiple sorted data set from DB in C#?

开发者 https://www.devze.com 2023-04-07 15:59 出处:网络
I have a partitioned table in SQL SERVER, there is a clustered index on ID, and the table is partitioned by period_key. The ID is uniqu开发者_运维技巧e inside partition, but not unique cross partition

I have a partitioned table in SQL SERVER, there is a clustered index on ID, and the table is partitioned by period_key. The ID is uniqu开发者_运维技巧e inside partition, but not unique cross partitions. What I need is to find all the unique ID. The simplest way is just use

select unique ID from tab

But that need to sort the DB in database which need quit a lot of temp disk, so lots of disk IO is required. Since the system is already IO bounded, I am thinking about cut the disk IO. Since we can read each partition in order by using the cluster index, suppose we have 10 partition, we can read one row from each partition, then compare them, and output the record with the min ID, say from partition X, and then read the next row from the partition X. And again compare these 10 rows, output the record with the min ID, etc. Just like external sort.

I don't have experience in C#, but know java. Could anyone give me some idea how to implement it in c#?


OK, if the requirement is to bypass the sort on the DB server side, and rather work out if an ID is unique or not on the client side, you can do something like this - select all ID values (no distinct in the query):

SELECT ID FROM tab

Then loop through all the values, adding them to a List. When that's done, you can ask the list to give back a version of itself with the duplicates removed. Here's a simplistic example:

List<int> allIDs = new List<int>();
foreach (DataRow row in someDataSet.Tables[0].Rows)
{
    allIDs.Add((int) row["ID"]);
}
List<int> uniqueIDs = allIDs.Distinct();

Disclaimer - I wrote that off the top of my head, so it might contain errors. This post contains a faster implementation based on HashSet.

0

精彩评论

暂无评论...
验证码 换一张
取 消