开发者

stop words in sitecore

开发者 https://www.devze.com 2023-02-08 05:22 出处:网络
We are using Lucene for text search as part of sitecore. Is there any met开发者_StackOverflow社区hod to ignore stop words (like a,an,the...) in the sitecore search?By default, Sitecore uses Lucene sta

We are using Lucene for text search as part of sitecore. Is there any met开发者_StackOverflow社区hod to ignore stop words (like a,an,the...) in the sitecore search?


By default, Sitecore uses Lucene standard analyzer - Lucene.Net.Analysis.Standard.StandardAnalyzer. You can see this is defined in /configuration/sitecore/search/analyzer element of web.config file. One of the constructors of StandardAnalyzer class accepts the array of strings it will consider stop words. By default it uses the hardcoded list of stop words which include:

"a", "an", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"

If you'd like to override this behavior, I think you should inherit StandardAnalyzer and override its default constructor to take the stop words from another source instead of the hardcoded array. You have various options, even reading it from a text file. Don't forget to replace the standard class with yours in web.config.

See other constructors of StandardAnalyzer class for more details. .NET Reflector is your friend here.


An example for Yans post:

public class CaseAnalyzer : Lucene.Net.Analysis.Standard.StandardAnalyzer
{
   private static Hashtable stopWords = new Hashtable(); //{{"by","by"}}; <-- Makes "by" a stopword that will not be matched in analyzer
   public CaseAnalyzer() : base(Lucene.Net.Util.Version.LUCENE_29, stopWords)
   {      
   }
}

this should be registered in the web.config under

/configuration/sitecore/search/analyzer

an example of the analyzer registration

<caseanalyzer type="EBF.Business.Search.Analyzers.CaseAnalyzer, EBF.Business, Version=1.0.0.0, Culture=neutral"/>

Lastly you just need to register your analyzer in the search configuration like this

<Analyzer ref="search/caseanalyzer" />
0

精彩评论

暂无评论...
验证码 换一张
取 消