PDF Parser API in Java [closed]_问答_开发者_运维开发者技术经验分享

开发者 https://www.devze.com 2023-01-07 14:26 出处：网络

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.

相关专题：parsing pdf

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.

Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has b开发者_开发问答een done so far to solve it.

Closed 9 years ago.

Improve this question

I want to convert the pdf data into our own file specifications. So pls help me out to choose the correct API for PDF parsing using java or .net. The parsing should extract each and every component(element) from the PDF pages.

There's a library called IText that does what you want. It's sort of the #1 product out there and is free as in beer.

I've worked with IText before, extracting content from PDFs, and while it's not super-duper automatic, it allows you to get at everything.

Recommended, in other words.

Elements do not exist in the PDF file. It is a set of Pdfobjects which generate the pages.

Try PDF Box http://java-source.net/open-source/pdf-libraries/pdf-box

Hope it will help.