开发者

SQL to detect similar records in the same database table

开发者 https://www.devze.com 2023-01-20 10:37 出处:网络
I have a requirement to loop through records in a database table and group items that have similar content. I want to match on a single 开发者_如何学运维column in the database and if there are similar

I have a requirement to loop through records in a database table and group items that have similar content. I want to match on a single 开发者_如何学运维column in the database and if there are similar records I want to extract the ID of each row and save it to another table e.g. if I had 10 similar rows they would be linked to one "header" record in another table.

Below is some simple Pseudocode to illustrate what I need to do:

For Each record in table

     If There is a similar record in header table Then
      Link this record to matching header table record 
     Else
      Create new Header record and link this record
     End If

End For

I'm using MSSQL 2008 with Full Text Search which will provide me with the mechanism I need to pick out similar records. At the moment I'm planning to create the four loop in C# Code and do the matching and the saving in SQL by calling a stored procedure to check for the matching record.

Something is telling me this should all be done in single stored procedure (and something else tells me keep logic in the code!).

Is there a neater way of doing this in SQL?


Databases are really good at dealing with distinct pieces of information. They are not so good at dealing with quasi-distinct information.

With that said, see if the soundex function works (well enough) for grouping similar inputs.

And, for the love of god, don't use anything like this in a production environment.


Here is an example..try changing it to your needs.

SELECT email, 
 COUNT(email) AS NumOccurrences
FROM users
GROUP BY email
HAVING ( COUNT(email) > 1 )


You may want to look into the MERGE statement that is new in SQL Server 2008. See, for example: Inserting, Updating, and Deleting Data by Using MERGE.


you can write a sproc and schedule a maintenance plan to run, or you can use embedded c# code on sql server, so you can build better algorithms easly in db side with c#. or you can write a windows service for a batch processing job that can run regulary.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号