I have a partitioned table in SQL SERVER, there is a clustered index on ID, and the table is partitioned by period_key. The ID is uniqu开发者_运维技巧e inside partition, but not unique cross partitions. What I need is to find all the unique ID. The simplest way is just use
select unique ID from tab
But that need to sort the DB in database which need quit a lot of temp disk, so lots of disk IO is required. Since the system is already IO bounded, I am thinking about cut the disk IO. Since we can read each partition in order by using the cluster index, suppose we have 10 partition, we can read one row from each partition, then compare them, and output the record with the min ID, say from partition X, and then read the next row from the partition X. And again compare these 10 rows, output the record with the min ID, etc. Just like external sort.
I don't have experience in C#, but know java. Could anyone give me some idea how to implement it in c#?
OK, if the requirement is to bypass the sort on the DB server side, and rather work out if an ID is unique or not on the client side, you can do something like this - select all ID values (no distinct in the query):
SELECT ID FROM tab
Then loop through all the values, adding them to a List
. When that's done, you can ask the list to give back a version of itself with the duplicates removed. Here's a simplistic example:
List<int> allIDs = new List<int>();
foreach (DataRow row in someDataSet.Tables[0].Rows)
{
allIDs.Add((int) row["ID"]);
}
List<int> uniqueIDs = allIDs.Distinct();
Disclaimer - I wrote that off the top of my head, so it might contain errors. This post contains a faster implementation based on HashSet
.
精彩评论