开发者

Best way to design soundex based search

开发者 https://www.devze.com 2023-02-24 05:39 出处:网络
I have a table of forum posts, and want to improve the basic search functionality as we get a lot of users from all over the world who are not native English speakers and will have trouble finding res

I have a table of forum posts, and want to improve the basic search functionality as we get a lot of users from all over the world who are not native English speakers and will have trouble finding results when they spell incorrectly. The current forum search is exact.

Which of these designs will perform best? Assume the database has 500,000 records and the search is used frequently. Ideally I would like it to search every record.

Design One

Along side each forum post, I store soundex_post, which contains all the soundex data. When a search is run, it soundexes all search terms, and does a LIKE operation on the soundex fields.

Design Two

I normalise it. Every soundex code is stored in a new table, tblSoundexCodes. Then there is a table tblForumPostSoundexCodes:

ID | Post_ID | SoundexCode_ID | Count

Then when a soundex is searched for, I simply pull out all the Post_IDs where Sound开发者_如何学运维exCode_ID = n

Am I correct that method two will be considerably faster, but that it will be a lot harder to maintain (IE, when people edit posts).


Design Two is better.

Design two won't be faster. The data storage will be more compact, and you will have to update or insert a row into tblForumPostSoundexCodes, as well as insert a row into tblSoundexCodes, when someone writes or updates a post.

You'll have to verify that this soundex transaction processing takes place for every change to a post (create, update, delete).


I have some doubts about your idea.

Let's take your own answer to one of the comments:

"@Frank is MSSQL, probably only going to be express edition, and I used soundex as an example I'll probably be implementing a more up to date algorithm"

This simple comments has 15 words (I discarded those shorter than 4 characters). So you will have potentially 15 different records pointing here with value "1".

What if I, a non-native English speaker, decide to search for ("esample" and "dait")? Will you offer the ability to search for more than one word? With and-? or-? not-?

Wouldn't it be better to run the search as-is and provide a "maybe you are looking for 'Example and Date'?" like Google or Wikipedia do?

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号