I want to recognize tables inside a pdf files. What SD开发者_如何学JAVAK is used in C# to recognize tables inside pdfs and some mechanism to read cell by cell, can any one please suggest?
PDF Sharp is good and its free. I've never done this in specific but it does correlate to all the major objects in the PDF format.
Tables do not exist inside a PDF as a structure unless it was created as Marked content with additional tagging in it. I wrote a blog post explaining some of the issues with text extraction from PDF files at http://www.jpedal.org/PDFblog/2009/04/pdf-text/
精彩评论