开发者

Finding the start and end of a match with Lucene

开发者 https://www.devze.com 2023-01-08 15:38 出处:网络
I would like to find the start and end positions of a match from a lucene (Version 3.0.2 for Java) query.It seems like I should be able to get this info from Highlighter or FastVectorHighligher, but t

I would like to find the start and end positions of a match from a lucene (Version 3.0.2 for Java) query. It seems like I should be able to get this info from Highlighter or FastVectorHighligher, but these classes seem only return a text fragment with the relevant text highlighted. Is there any way to get this info, either with a Highlighter or from the ScoreDoc itself?

Update开发者_运维问答: I found this related question: Finding the position of search hits from Lucene

But I think the answer by Allasso won't work for me because my queries are phrases, not individual terms.


If I were you I'd just take code from FastVectorHighlighter. Relevant code is in FieldTermStack:

        List<string> termSet = fieldQuery.getTermSet(fieldName);
        VectorHighlightMapper tfv = new VectorHighlightMapper(termSet);    
        reader.GetTermFreqVector(docId, fieldName, tfv);  // <-- look at this line

        string[] terms = tfv.GetTerms();
        foreach (String term in terms)
        {
            if (!termSet.Contains(term)) continue;
            int index = tfv.IndexOf(term);
            TermVectorOffsetInfo[] tvois = tfv.GetOffsets(index);
            if (tvois == null) return; // just return to make null snippets
            int[] poss = tfv.GetTermPositions(index);
            if (poss == null) return; // just return to make null snippets
            for (int i = 0; i < tvois.Length; i++)
                termList.AddLast(new TermInfo(term, tvois[i].GetStartOffset(), tvois[i].GetEndOffset(), poss[i]));

The major thing there is reader.GetTermFreqVector(). Like I said, FastVectorHighlighter already does some legwork that I would just copy, but if you want, that GetTermPositions call should do everything you need.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号