开发者

How can I make Sphinx ignore some characters?

开发者 https://www.devze.com 2023-03-07 13:46 出处:网络
I\'m making a PHP website 开发者_JAVA百科with MySQL backend and Sphinx as a search engine. Say, I have an item with the designer \"Ray-Ban\" and I need to get it as a result when the user types \"ray

I'm making a PHP website 开发者_JAVA百科with MySQL backend and Sphinx as a search engine. Say, I have an item with the designer "Ray-Ban" and I need to get it as a result when the user types "ray ban" or "rayban". Should there be an exclusion list somewhere?


The standart way to do so is a charset_table option. charset_table defines characters that only have to be tokenized,

ie with this charset_table

index YOUR_INDEX_NAME
{
charset_table =  0..9, A..Z->a..z, _, a..z

such text

My best fiend is Hoo-foo but not Pe_ter.!!! That's all.

is parsed as these tokens

my best friend is hoo foo but not pe_ter that s all


Your best bet is probably the exceptions file - although that means you'll need to know every case where you want two different words/phrases to be treated the same.


As of version 0.9.8 there is an exclusion list option available per index named ignore_chars.

eg.

index YOUR_INDEX {
        charset_type = utf-8
        ignore_chars = -

More information available on the Sphinx website: http://sphinxsearch.com/docs/manual-0.9.8.html#conf-ignore-chars

Side note: they show using U+AD to remove soft-hyphens in their example. For some reason this didn't work for me, but the example I gave above worked fine.

0

精彩评论

暂无评论...
验证码 换一张
取 消