开发者

Django-Haystack with Solr contains search

开发者 https://www.devze.com 2023-03-12 10:56 出处:网络
I am using haystack within a project using solr as the backend. I want to be able to perform a contains search, similar to the Django .filter(something__contains=\"...\")

I am using haystack within a project using solr as the backend. I want to be able to perform a contains search, similar to the Django .filter(something__contains="...")

The __startswith option does not suit our needs as it, as the name suggests, looks for words that start 开发者_运维知识库with the string.

I tried to use something like *keyword* but Solr does not allow the * to be used as the first character

Thanks.


To get "contains" functionallity you can use:

<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="100" side="back"/>
<filter class="solr.LowerCaseFilterFactory" />

as index analyzer.

This will create ngrams for every whitespace separated word in your field. For example:

"Index this!" => x, ex, dex, ndex, index, !, s!, is!, his!, this!

As you see this will expand your index greatly but if you now enter a query like:

"nde*"

it will match "ndex" giving you a hit.

Use this approach carefully to make sure that your index doesn't get too large. If you increase minGramSize, or decrease maxGramSize it will not expand the index as mutch but reduce the "contains" functionallity. For instance setting minGramSize="3" will require that you have at least 3 characters in your contains query.


You can achieve the same behavior without having to touch the solr schema. In your index, make your text field an EdgeNgramField instead of a CharField. Under the hood this will generate a similar schema to what lindstromhenrik suggested.


I am using an expression like: .filter(something__startswith='...') .filter_or(name=''+s'...') as is seems solr does not like expression like '...*', but combined with or will do


None of the answers here do a real substring search *keyword*.

They don't find the keyword that is part of a bigger string, (not a prefix or suffix).

Using EdgeNGramFilterFactory or the EdgeNgramField in the indexes can only do a "startswith" or a "endswith" type of filtering.

The solution is to use a NgramField like this:

class MyIndex(indexes.SearchIndex, indexes.Indexable):
    ...
    field_to_index= indexes.NgramField(model_attr='field_name')
    ...

This is very elegant, because you don't need to manually add anything to the schema.xml

0

精彩评论

暂无评论...
验证码 换一张
取 消