开发者

How to read the empty cell in a PDF file in ASP.net

开发者 https://www.devze.com 2023-01-30 12:54 出处:网络
I am able to read a pdf file using PDFBOX in my ASP.net application but it is not adding space for an empty cell in a table, So how to read empty fields from a pdf file using PDFBOX in C#. Is there an

I am able to read a pdf file using PDFBOX in my ASP.net application but it is not adding space for an empty cell in a table, So how to read empty fields from a pdf file using PDFBOX in C#. Is there any other method to read the 开发者_运维问答pdf file .

Thanks .


You might be able to pull off this sort of thing if you know exactly where the text should be in advance and can get the locations of the text as you extract it.

If you don't know in advance where the rows and cells are, you'll have to guess based on the text locations. This will not be easy.

In general, extracting data from PDF is ill advised. PDFs don't have a concept of "tables" (unless the PDF creator goes well out of there way to use "Marked Content", which is still rare). PDFs have lines, glyphs, and images (a pile of pixels). It is Very Hard to extract formatting from that information... and sometimes it is all but impossible.

I don't know if PDFBox will give you the locations of extracted text, but iTextSharp will.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号