开发者

How to Convert pdf file to datatable

开发者 https://www.devze.com 2023-03-04 20:18 出处:网络
Is there any way to convert PDF file to DataTable? The PDF开发者_高级运维 file mainly consist of only tables any help will be highly appreciated. using iTextSharp.text;

Is there any way to convert PDF file to DataTable? The PDF开发者_高级运维 file mainly consist of only tables any help will be highly appreciated.


using iTextSharp.text;
using iTextSharp.text.pdf;
using iTextSharp.text.pdf.parser;

 public DataTable ImportPDF(string Filename)
    {
        string strText = string.Empty;
        List<string[]> list = new List<string[]>();
        string[] PdfData = null;
        try
        {
            PdfReader reader = new PdfReader((string)Filename);
            for (int page = 1; page <= reader.NumberOfPages; page++)
            {
                ITextExtractionStrategy its = new iTextSharp.text.pdf.parser.LocationTextExtractionStrategy();
                String cipherText = PdfTextExtractor.GetTextFromPage(reader, page, its);
                cipherText = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(cipherText)));
                strText = strText + "\n" + cipherText;
                PdfData = strText.Split('\n');

            }
            reader.Close();
        }
        catch (Exception ex)
        {
        }

        List<string> temp = PdfData.ToList();
        temp.RemoveAt(0);
        list = temp.ConvertAll<string[]>(x => x.Split(' ').ToArray());
        List<string> columns = list.FirstOrDefault().ToList();
        DataTable dtTemp = new DataTable();
        columns.All(x => { dtTemp.Columns.Add(new DataColumn(x)); return true; });
        list.All(x => { dtTemp.Rows.Add(dtTemp.NewRow().ItemArray = x); return true; });
        return dtTemp;
    }


If the PDF contains marked content (you can see how to find this in my blog article http://www.jpedal.org/PDFblog/2010/09/the-easy-way-to-discover-if-a-pdf-file-contains-structured-content/) you can extract it from the PDF file. Otherwise you will need to extract the text and try to guess the structure.

0

精彩评论

暂无评论...
验证码 换一张
取 消