apache-tika
Solr : file entity processor and delta import
I\'m using solr 3.3 and i want to use delta import with file entity pro开发者_开发百科cessor and tika entity processor. Full import works fine but the delta import parameter doesn\'t import the new do[详细]
2023-04-07 19:55 分类:问答Error while parsing Binary Files... (mostly PDF)
I am trying to parse pdf file using Apache Tika by using ByteArrayInputStream for Binary files... And started getting error for some pdf file and for some it is parsing very well.. Earlier I was able[详细]
2023-04-06 09:49 分类:问答trying to override dependency of Apache Tika 0.9 from PDFBOX 1.4.0 to PDFBOX 1.6.0
<dependency> <groupId>org.apache.tika</groupId> <artifactId>tika-parsers</artifactId>[详细]
2023-04-06 07:06 分类:问答How can i integrate Tika in my Lucene project?
I want to integrate Apache Tika in my java project. I need to get text from different file formats (excel, doc, ppt, and more..)[详细]
2023-04-03 01:52 分类:问答javax.mail.MessagingException in Tika
Hi I am using using apache Tika, and I made few changes to Tika as per my requirement and I am able to build the Tika successfully. But when i am trying to run the Tika i am getting the following exce[详细]
2023-03-31 03:55 分类:问答Getting MimeType subtype with Apache tika
I\'d need to get the iana.org MediaTyperather thanapplication/zip or application/x-tika-msoffice for documents like, odt, ppt, pptx, xlsx etc.[详细]
2023-03-29 09:07 分类:问答tika returning incorrect line of text for pdf with lots of tables
I am using tika 开发者_运维技巧to extract text from a pdf file that has lot of tables. java -jar tika-app-0.9.jar -t https://s3.amazonaws.com/centraldoc/alg1.pdf[详细]
2023-03-28 18:28 分类:问答Error with Extracting PDF metadata using Solr
I am using Solr 3.3 and I am trying to extract and index meta data from PDF files. I am using the DataImportHandler with the TikaEntityProcessor to add the documents. Here is are the fields as defined[详细]
2023-03-24 09:33 分类:问答Solr open document after searching a keyword
I am trying to index some PDF documents and then create a Search UI . This que开发者_Go百科stion is somewhat related to[详细]
2023-03-22 23:45 分类:问答Verifying integrity of documents
Wh开发者_运维百科at are the steps to verify integrity of these documents ? doc,docx,docm,odt,rtf,pdf,odf,odp,xls,xlsx,xlsm,ppt,pptm[详细]
2023-03-22 10:05 分类:问答