solr filter or tokenizer to make combinations of words_问答_开发者

solr filter or tokenizer to make combinations of words

开发者 https://www.devze.com 2023-04-08 01:02 出处：网络

I\'m trying to implement a reasonable name suggest feature using a series of filters. At the moment I have

相关专题：solr

I'm trying to implement a reasonable name suggest feature using a series of filters. At the moment I have

        <fieldType name="suggester" class="solr.TextField" positionIncrementGap="1" autoGeneratePhraseQueries="true">
        <analyzer type="index">
            <tokenizer class="s开发者_高级运维olr.WhitespaceTokenizerFactory"/>
            <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1"
                    catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
            <filter class="solr.ASCIIFoldingFilterFactory"/>
            <filter class="solr.LowerCaseFilterFactory"/>
            <filter class="solr.ShingleFilterFactory" outputUnigramsIfNoShingles="true" maxShingleSize="2"
                    outputUnigrams="true"/>
            <filter class="solr.EdgeNGramFilterFactory" minGramSize="3" maxGramSize="15"/>
        </analyzer>
        <analyzer type="query">
            <tokenizer class="solr.WhitespaceTokenizerFactory"/>
            <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1"
                    catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
            <filter class="solr.ASCIIFoldingFilterFactory"/>
            <filter class="solr.LowerCaseFilterFactory"/>
            <filter class="solr.ShingleFilterFactory" outputUnigramsIfNoShingles="true" maxShingleSize="2"
                    outputUnigrams="true"/>
            <filter class="solr.EdgeNGramFilterFactory" minGramSize="3" maxGramSize="15"/>
        </analyzer>
    </fieldType>

Which certainly needs more tuning, but I'm after one particular aspect for this question. For an input string mark daniel sievers the above will match on a query onmark and sievers but what I really want is to reduce the verbosity of the EdgeNGramFilter because it causes overmatching and use a filter/tokenizer that can combine words in some configurable manner, eg for input mark daniel rex sievers create tokens mark sievers, mark daniel sievers, mark rex sievers and so on. I didnt apply any paricular algorithm to that, but I'm wondering if such a beast exists (almost certainly does) or its best to write my own as a filter plugin?

Solr 3.3.0

I'd use a ShingleFilter : http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ShingleFilterFactory

For example :

<filter class="solr.ShingleFilterFactory" maxShingleSize="3" outputUnigrams="true"/>

Input : mark daniel sievers.

Tokens produced : mark, mark daniel, mark daniel sievers, daniel, daniel sievers, sievers.

solr filter or tokenizer to make combinations of words

精彩评论

关注公众号

热门标签

图文推荐

solr filter or tokenizer to make combinations of words

更多 问答 相关资讯：

精彩评论

关注公众号

热门标签

图文推荐

更多问答相关资讯：