开发者

Indexing PDF with page numbers with Solr

开发者 https://www.devze.com 2023-01-23 16:33 出处:网络
I\'m indexing PDFs with Solr using the ExtractingRequestHandler. I would like to display the page number along with hits in a document, e.g. \"term foo was found in bar.pdf on pages 2, 3 and 5.\"

I'm indexing PDFs with Solr using the ExtractingRequestHandler. I would like to display the page number along with hits in a document, e.g. "term foo was found in bar.pdf on pages 2, 3 and 5."

Is it开发者_运维百科 possible to include page numbers in the query result like this?


It would require some development effort, but you could achieve this by indexing each page of each document as a seperate Solr document, and then use field collapsing to group the different page hits for each document.

Note that you need a nightly for this, field collapsing is not implemented in any currently released Solr version.

Also note: Field Collapsing is implemented in version Solr 3.3. More updates are expected in the next big version ( Solr 4.0)

0

精彩评论

暂无评论...
验证码 换一张
取 消