开发者

ToTitleCase in solr to stop SCREAMING CAPS in Solr

开发者 https://www.devze.com 2022-12-19 11:43 出处:网络
I\'m using solr\'s faceting and i\'ve run into a problem that i was hoping i could get around using filters.

I'm using solr's faceting and i've run into a problem that i was hoping i could get around using filters.

Basically some times a town name will come through to SOLR as

"CAMBRIDGE"

and sometime's it will come through as

"Cambridge"

I wanted to use a filter in Solr to stop the SCREAMING CAPS version of the town name. It seems there is a fitler to make all the text lower case.

<!-- A text field that only sorts out casing for faceting -->
    <fieldType name="text_facet" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
  <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
      <analyzer type="query">
  <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/&开发者_JAVA技巧gt;
      </analyzer>
    </fieldType>

I was wondering if anyone knew of a filter which will Ignore the First character of a word and apply lowercase to the rest of the characters. E.g.

  • CAMBRIDGE >> Cambridge
  • KingsTON Upon HULL >> Kingston Upon Hull

etc

Alternatively if it's easy to write your own filters.. some help on how to do that would be appreciated.. I'm not a Java person..

Thanks


AFAIK there is no built-in filter like that. If you want to write it, see LowerCaseFilterFactory and LowerCaseFilter for reference, it doesn't seem to be very hard.

Or you could do this client-side, i.e. in SolrNet you could write a ISolrOperations decorator that does the necessary transformations after the real query, using ToTitleCase.


Perhaps you could make use of the solr.PatternReplaceCharFilterFactory?

<fieldType name="textCharNorm" class="solr.TextField">
  <analyzer>
    <filter class="solr.LowerCaseFilterFactory"/>
    <charFilter class="solr.PatternReplaceCharFilterFactory"
                pattern="([^\s]{1})([^\s]*)" replaceWith="\U$1\L$2"/>
  </analyzer>
</fieldType>

Notice, I haven't tested the code or solr.PatternReplaceCharFilterFactory, so I'm not sure if it works. If you need to build your own filter this guide might be useful:

http://robotlibrarian.billdueber.com/building-a-solr-text-filter-for-normalizing-data/

// John

0

精彩评论

暂无评论...
验证码 换一张
取 消