开发者

How to Extract docx (Word 2007 above) using Apache POI

开发者 https://www.devze.com 2023-01-13 14:55 出处:网络
Hai, i\'m using Apache POI 3.6 I\'ve already created some code.. XWPFDocument doc = new XWPFDocument(new FileInputStream(file));

Hai, i'm using Apache POI 3.6 I've already created some code..

XWPFDocument doc = new XWPFDocument(new FileInputStream(file));
         wordxExtractor = new XWPFWordExtractor(doc);
         text = wordxExtractor.getText();

         System.out.println("adding docx " + file);
         d.add(new Field("content", text, Field.Store.NO, Field.Index.ANALYZED));

unfortunately, it generated error..

Exception in thread "main" java.lang.NoClassDefFoundError: org/dom4j/Documen开发者_开发技巧tException
at org.apache.poi.openxml4j.opc.OPCPackage.init(OPCPackage.java:149)
at org.apache.poi.openxml4j.opc.OPCPackage.<init>(OPCPackage.java:136)
at org.apache.poi.openxml4j.opc.Package.<init>(Package.java:54)
at org.apache.poi.openxml4j.opc.ZipPackage.<init>(ZipPackage.java:98)
at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:199)
at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:178)
at org.apache.poi.util.PackageHelper.open(PackageHelper.java:53)
at org.apache.poi.xwpf.usermodel.XWPFDocument.<init>(XWPFDocument.java:98)
at org.apache.lucene.demo.Indexer.indexDocs(Indexer.java:153)
at org.apache.lucene.demo.Indexer.main(Indexer.java:88)

It seemed that it used Constructor

XWPFWordExtractor(OPCPackage container)

but not this one ->

XWPFWordExtractor(XWPFDocument document)

Any wondering why?? Or any idea how I can extract the .docx then convert it into a String?


You need to Add dom4j Library to your claspath or your project libraries


It looks like you don't have all of the dependencies on your classpath.

If you look at http://poi.apache.org/overview.html you'll see that dom4j is a required library when working with the OOXML files. From the exception you got, it seems that you don't have it... If you look in the POI binary download, you should find it in the ooxml-libs subdirectory.


You could try docx4j instead; see http://dev.plutext.org/svn/docx4j/trunk/docx4j/src/main/java/org/docx4j/TextUtils.java

0

精彩评论

暂无评论...
验证码 换一张
取 消