开发者

search query "alien vs predator"

开发者 https://www.devze.com 2023-01-17 19:04 出处:网络
How do you do so that when you search for \"alien vs predator\" you also get results with the string \"alienS vs predator\" with the \"S\"

How do you do so that when you search for "alien vs predator" you also get results with the string "alienS vs predator" with the "S"

example http://www.torrentz.com/search?q=alien+vs+p开发者_如何学Credator

how have they implemented this?

is this advanced search engine stuff?


This is known as Word Stemming. When the text is indexed, words are "stemmed" to their "roots". So fighting becomes fight, skiing becomes ski, runs becomes run, etc. The same thing is done to the text that a user enters at search time, so when the search terms are compared to the values in the index, they match.

The Lucene project supports this. I wouldn't consider it an advanced feature. Especially with the expectations that Google has set.


Checking for plurals is a form of stemming. Stemming is a common feature of search engines and other text matching. See the wikipedia page: http://en.wikipedia.org/wiki/Stemming for a host of algorithms to perform stemming.


Typically when one sets up a search engine to search for text, one will construct a query that's something like:

SELECT * FROM TBLMOVIES WHERE NAME LIKE '%ALIEN%'

This means that the substring ALIEN can appear anywhere in the NAME field, so you'll get back strings like ALIENS.


When words are indexed they are indexed by root form. For example for "aliens", "alien", "alien's", "aliens'" are all stored as "alien".

And when words are search search engine also searches only the root form "alien".

This is often called as Porter Stemming Algorithm. You can download its realization for your favorite language here - http://tartarus.org/~martin/PorterStemmer/


This is a basic feature of a search engine, rather than just a program that matches your query with a set of pre-defined results.

If you have the time, this is a great read, all about different algorithms, and how they are implemented.


You could try using soundex() as a fuzzy match on your strings. If you save the soundex with the title then compare that index vs a substring using LIKE 'XXX%' you should have a decent match. The higher the substring count the closer they will match.

see docs: http://dev.mysql.com/doc/refman/5.1/en/string-functions.html#function_soundex

0

精彩评论

暂无评论...
验证码 换一张
取 消