I'm making a PHP website 开发者_JAVA百科with MySQL backend and Sphinx as a search engine. Say, I have an item with the designer "Ray-Ban" and I need to get it as a result when the user types "ray ban" or "rayban". Should there be an exclusion list somewhere?
The standart way to do so is a charset_table option. charset_table defines characters that only have to be tokenized,
ie with this charset_table
index YOUR_INDEX_NAME
{
charset_table = 0..9, A..Z->a..z, _, a..z
such text
My best fiend is Hoo-foo but not Pe_ter.!!! That's all.
is parsed as these tokens
my best friend is hoo foo but not pe_ter that s all
Your best bet is probably the exceptions file - although that means you'll need to know every case where you want two different words/phrases to be treated the same.
As of version 0.9.8 there is an exclusion list option available per index named ignore_chars.
eg.
index YOUR_INDEX {
charset_type = utf-8
ignore_chars = -
More information available on the Sphinx website: http://sphinxsearch.com/docs/manual-0.9.8.html#conf-ignore-chars
Side note: they show using U+AD to remove soft-hyphens in their example. For some reason this didn't work for me, but the example I gave above worked fine.
精彩评论