开发者

Sphinx and wordforms

开发者 https://www.devze.com 2022-12-20 06:47 出处:网络
How could I make Sphinx to recognize \"auto\" and \"car\" as similar words? Let\'s image I have three database records

How could I make Sphinx to recognize "auto" and "car" as similar words?

Let's image I have three database records

Andy likes to drive auto.
Mary don't like to drive car.
Bob is going to buy automobile.

Here is sample queries and it's results...

query: car
result: Mary do开发者_开发知识库n't like to drive car.
-------------------------------------
query: auto
result: Andy likes to drive auto.
-------------------------------------
query: automobile
Bob is going to buy automobile.

..but I want sphinx to return...

query: car
result:
Andy likes to drive auto.
Mary don't like to drive car.
Bob is going to buy automobile.
-------------------------------------
query: auto
result:
Andy likes to drive auto.
Mary don't like to drive car.
Bob is going to buy automobile.
-------------------------------------
query: automobile
result:
Andy likes to drive auto.
Mary don't like to drive car.
Bob is going to buy automobile.

I know that Sphinx have stowords, but what should I put into stopwords dictionary to make Sphinx think this way?

Thank you.


all you have to do is supply sphinx with a correctly-formatted text file of wordforms in your .conf file.

documentation found here: http://www.sphinxsearch.com/docs/manual-0.9.9.html#conf-wordforms

auto > car
automobile > car
four-wheeled-vehicle-intended-for-public-roads > car
cars > car


Let me give you an example for wordforms morphology with the terms "gearing" and "leverage" as these words are equal terms in finance and should be considered as synonyms (the meaning of both words is "Financial leverage").

Originally your "wordforms.txt" file should contain them listed like this:

gear > gear
geared > gear
gearing > gear
gears > gear
……
leverage > leverage
leveraged > leverage
leverages > leverage
leveraging > leverage

It means that originally these two words are not connected. In order to fix that you should modify the content of "wordforms.txt" this way:

gear > leverage
geared > leverage
gearing > leverage
gears > leverage
……
leveraged > leverage
leverages > leverage
leveraging > leverage

This edit connects them (and all their forms). After you edit the "wordforms.txt" file you must save it and re-index your indexes in order to apply the changes.

Now when you search for "gearing" or "leverage" your results will contain both the words along with all their morphological forms.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号