开发者

Open Source library / tool for converting PDF to HTML? [closed]

开发者 https://www.devze.com 2023-03-20 04:12 出处:网络
Closed. This question is seeking recommendations for books, tools, software libraries, and more. It does not meet Stack Overflow guidelines. It is not currently accepting answers.
Closed. This question is seeking recommendations for books, tools, software libraries, and more. It does not meet Stack Overflow guidelines. It is not currently accepting answers.

We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.

Closed 6 years ago.

开发者_开发技巧 Improve this question

My needs are quite simple, I need a tool or library (library would be perfect), to convert a PDF file to an HTML file keeping as many of the information as possible, except any images or styles, just semantic information.

I've checked out iTextPdf, but I haven't found anything like it. Any help would be nice.

Thanks in advance


Use iTextSharp. It's free and you only need the "itextsharp.dll".

http://sourceforge.net/projects/itextsharp/

Here is a simple function for reading the text out of a PDF.

Public Shared Function GetTextFromPDF(PdfFileName As String) As String
    Dim oReader As New iTextSharp.text.pdf.PdfReader(PdfFileName)

    Dim sOut = ""

    For i = 1 To oReader.NumberOfPages
        Dim its As New iTextSharp.text.pdf.parser.SimpleTextExtractionStrategy

        sOut &= iTextSharp.text.pdf.parser.PdfTextExtractor.GetTextFromPage(oReader, i, its)
    Next

    Return sOut
End Function
0

精彩评论

暂无评论...
验证码 换一张
取 消