开发者

Solr Tokenizer Question

开发者 https://www.devze.com 2023-04-05 17:47 出处:网络
I have what I think is a simple solr exercise, but I\'m unsure what to use. I have a field of names, e.g. Joe Smith and Jack Daniels and Steve. They could each be one name or two names. I want to be

I have what I think is a simple solr exercise, but I'm unsure what to use.

I have a field of names, e.g. Joe Smith and Jack Daniels and Steve. They could each be one name or two names. I want to be able to search this s.t. if you search for "Danie" you get everything that has a first or last name that starts with "Danie". Three example returns would be开发者_C百科 "Danielle", "Steven Daniels", and "Danier Daniellson".

I would also like it so that the preference is given to the first name.

So two questions would be do I need to use a copyField and break up the names into first and last name? And what would my analyzer look like?

Edit: Two edits on the searching ability. 1. Something like "Joe S" should return all users that look like "Joe S*" 2. If a user searches with an "&" character, that should be included in the search and not used as an operator.


To solve your first part I suggest the following solution:

index your fields twice:

  • once with solr.KeywordTokenizerFactory - that will index your entire field as it is. It will not be splitted into tokens. This will be useful for boosting results with the preference given to the first name.
  • once with WordDelimiterTokenizerFactory or StandardTokenizerFactory

You can find more about these tokenizers here: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

After you indexed them in two filters with different tokenizers you just use boost query to boost your results from one field (the one with preference given to the first name) as it is explained here: http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_make_.22superman.22_in_the_title_field_score_higher_than_in_the_subject_field


If a user searches with an "&" character, that should be included in the search and not used as an operator.

For this part you either use DisMax query http://wiki.apache.org/solr/DisMaxQParserPlugin or when you make a request use "&" instead of & Also you need to use a tokenizer like WhiteSpaceDelimiter to just keep other characters in tokens.

0

精彩评论

暂无评论...
验证码 换一张
取 消