apache-tika
solr tika extraction problem
I am using tika with dataimporthandler. while executing the full-import I am getting the following errors.[详细]
2023-02-16 01:17 分类:问答Retrieving extracted text with Apache Solr
I\'m new to Apache Solr, and I want to use it for indexing pdf files. I managed to get it up and running so far and I can now search for added pdf files.[详细]
2023-02-09 07:02 分类:问答Indexing PDF with page numbers with Solr
I\'m indexing PDFs with Solr using the ExtractingRequestHandler. I would like to display the page number along with hits in a document, e.g. \"term foo was found in bar.pdf on pages 2, 3 and 5.\"[详细]
2023-01-23 16:33 分类:问答Using Solr CELL's ExtractingRequestHandler to index/extract files from package formats
Can you use ExtractingRequestHandler and Tika with any of the compressed file formats (zip, tar, gz, etc) to extract the content out for indexing?[详细]
2023-01-21 17:11 分类:问答Solr's TikaEntityProcessor not working
I\'m trying to get Solr to index a database in which one column is a filename of a PDF document I\'d like to index. My configuration looks like this:[详细]
2023-01-02 10:39 分类:问答Solr; What does this mean?
At the end of the README.txt file which is located in the example directory under solr, I find this li开发者_JAVA百科ne:[详细]
2023-01-01 18:05 分类:问答Indexing PDF files with Symfony using Lucene
I am a Symfony developer and my web server is Linux. I already use the sfLucene plugin. What is the simplest way of indexing PDF files for search on a Linux PHP server?[详细]
2022-12-21 03:10 分类:问答Solr ExtractingRequestHandler giving empty content for pdf documents
I am using ExtractingRequestHandler in Solr for getting document content and index it. It works fine for all Microsoft Documents, but for PDFs, the content being extracted is empty. I have also tried[详细]
2022-12-15 07:10 分类:问答