We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
开发者_开发技巧 Improve this questionMy needs are quite simple, I need a tool or library (library would be perfect), to convert a PDF file to an HTML file keeping as many of the information as possible, except any images or styles, just semantic information.
I've checked out iTextPdf, but I haven't found anything like it. Any help would be nice.
Thanks in advance
Use iTextSharp. It's free and you only need the "itextsharp.dll".
http://sourceforge.net/projects/itextsharp/
Here is a simple function for reading the text out of a PDF.
Public Shared Function GetTextFromPDF(PdfFileName As String) As String
Dim oReader As New iTextSharp.text.pdf.PdfReader(PdfFileName)
Dim sOut = ""
For i = 1 To oReader.NumberOfPages
Dim its As New iTextSharp.text.pdf.parser.SimpleTextExtractionStrategy
sOut &= iTextSharp.text.pdf.parser.PdfTextExtractor.GetTextFromPage(oReader, i, its)
Next
Return sOut
End Function
精彩评论