开发者

How can i integrate Tika in my Lucene project?

开发者 https://www.devze.com 2023-04-03 01:52 出处:网络
I want to integrate Apache Tika in my java project. I need to get text from different file formats (excel, doc, ppt, and more..)

I want to integrate Apache Tika in my java project. I need to get text from different file formats (excel, doc, ppt, and more..) After some reading I understand that the only way to build tika is by downloading the src file and build it with Maven. I execute "mvn install" in the root directory of Tika src (apache-tika-0.9-src) , but i get this error:

[INFO] Scanning for projects...
Downloading: http://repo1.maven.org/maven2/org/apache/apache/6/apache-6.pom
[ERROR] The build could not read 1 project -> [Help 1]
[ERROR]
[ERROR]   The project org.apache.tika开发者_运维技巧:tika:0.9 (C:\Users\vexler\Documents\Instal
ls\apache-tika-0.9-src\apache-tika-0.9\pom.xml) has 1 error
[ERROR]     Non-resolvable parent POM for org.apache.tika:tika-parent:0.9: Could
 not transfer artifact org.apache:apache:pom:6 from/to central (http://repo1.mav
en.org/maven2): Error transferring file: Connection timed out: connect and 'pare
nt.relativePath' points at no local POM @ org.apache.tika:tika-parent:0.9, C:\Us
ers\vexler\Documents\Installs\apache-tika-0.9-src\apache-tika-0.9\tika-parent\po
m.xml, line 25, column 11 -> [Help 2]

I really appriciate any help with this error. Thanks :-) Reuth


Assuming you're using Maven in your project, then life is much much simpler

Just add something like

<dependency>
   <groupId>org.apache.tika</groupId>
   <artifactId>tika-parsers</artifactId>
   <version>0.9</version>
   <scope>provided</scope>
 </dependency>

And Maven will then download Tika and it's dependencies for you

Alternately, if you download the latest Tika OSGi Bundle Jar (eg 0.9) and unpack that, then you'll get the Tika dependencies and code in that

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号