I have a directory filled with subdirectories, all with PDFs files and/or subdirectories filled with PDF files. Essentially, a very unorganized group of PDFs. What I'd like to do is parse each file, pulling 开发者_开发知识库the contents of one specific field, and dumping the output to a text file. The end result would be a large text file containing the contents of the field within each individual PDF. Surely this is possible. The question is whether it can be done easily, without much programming.
In my opinion, the best option is to pay a little money for a 3rd party component that will provide an API.
http://www.aspose.com/categories/java-components/aspose.pdf-for-java/default.aspx http://www.pdfcomponent.com/java-pdf/
If it doesn't have to be in Java, I believe that PHP has an open source library.
I've only ever used the PDF generation features of iText, but I know it also has PDF text extraction features. It's licensed under the GPL, or a paid commercial license if you need to redistribute it.
http://itextpdf.com/
精彩评论