开发者

Read PDF through Java and get the HTML Content

开发者 https://www.devze.com 2022-12-31 18:57 出处:网络
I want to read an existing PDF file, get not only the text, but also the format information 开发者_Python百科like: Font (Bold, Italic),paragraphs,images, tables. Basically I want to write an HTML simi

I want to read an existing PDF file, get not only the text, but also the format information 开发者_Python百科like: Font (Bold, Italic),paragraphs,images, tables. Basically I want to write an HTML similar to PDF.

Is there an code library for doing this? I am looking for an Open Source Library.

Regards, Tina Agrawal


Try the PDFBox or iText. They are open source, and can handle text, images ,tables, etc.


If you want an exact version of the page, you may need to create an image of the page and put invisble text on it. Can can see some idea of what is possible on our blog at http://www.jpedal.org/PDFblog/2012/08/4-ways-to-convert-pdf-to-html5/ with PDF to HTML conversion.

0

精彩评论

暂无评论...
验证码 换一张
取 消