Hibernate Search: Not tokenized query_问答_开发者

Hibernate Search: Not tokenized query

开发者 https://www.devze.com 2023-02-04 09:44 出处：网络

I\'m using Hibernate Search. The problem is that, when I perform a search th开发者_开发百科is string:

I'm using Hibernate Search. The problem is that, when I perform a search th开发者_开发百科is string:

"l"

I haven't results... If I try with this...

"l*"

result is:

"Lampada bla bla" "Lampione bla bla bla" "Lost"

This is my pojo

@Id @GeneratedValue
@DocumentId
private Long id;

@Field(index=Index.TOKENIZED, store=Store.YES )
private String nome;

@Field(index=Index.TOKENIZED,store=Store.YES, termVector=TermVector.YES)
private String descrizione;

@Column(length=30)
public String getNome() {
    return nome;
}
public void setNome(String nome) {
    this.nome = nome;
}

@Column(length=100)
public String getDescrizione() {
    return descrizione;
}
public void setDescrizione(String descrizione) {
    this.descrizione = descrizione;
}
public Long getId() {
    return id;
}
public void setId(Long id) {
    this.id = id;
}

@Override
public String toString() {
    return String.format("(%s) %s: %s", id, nome, descrizione);
}

This is my java class:

Session session = super.session();
    List result = new ArrayList();
    luceneSession = Search.getFullTextSession(session);

    String[] fields = (String[]) boostsNField.keySet().toArray(new String[boostsNField.keySet().size()]);

    QueryParser parser =  new MultiFieldQueryParser(Version.LUCENE_30, fields, new StandardAnalyzer(Version.LUCENE_30), boostsNField);
    StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_30);


    try {
        Query luceneQuery = parser.parse(queryString);
        org.hibernate.Query fullTextQuery = luceneSession.createFullTextQuery( luceneQuery ); // E' possibile scegliere impostare il class da ricercare
        result = fullTextQuery.list();

    } catch (ParseException e) {

Where is the problem??!?!?

You're probably using StandardAnalyzer to index documents. As javadoc says, it uses StandarkTokenizer. This tokenizer extracts words from text and processes them with few simple rules (read javadoc). then some filtering of tokens happens, but generally word becomes token.

I don't know details, but i think that when searching, lucene compares/searches for tokens (words in your case), so quering with "l" result with empty list because "l" token is not the same as "Lampada" token.

If you want to search your index using any substring of indexed documents you should consider using/writing tokenizer based on ngrams (). For every possible substring of given string it produces token. Having "Lampada" as string, it will produce "L", "La", "Lam",...,"ada", "da", "a", then even when queering LuceneIndex with StandardTokenizer and query="l" you will find matching documents (be aware that this approach, increases speed in which index grows).