iTextSharp Pdf pages import memory issue_问答_开发者

I am using this code to import different pdf files pages to a single document. When i import large files (200 pages or above) I am getting a OutOfMemory exception. Am i doing something wrong here?

    private bool SaveToFile(string fileName)
    {
        try
        {
            iTextSharp.text.Document doc;
            iTextSharp.text.pdf.PdfCopy pdfCpy;
            string output = fileName;

            doc = new iTextSharp.text.Document();
            pdfCpy = new iTextSharp.text.pdf.PdfCopy(doc, new System.IO.FileStream(output, System.IO.FileMode.Create));
            doc.Open();

            foreach (DataGridViewRow item in dvSourcePreview.Rows)
            {
                string pdfFileName = item.Cells[COL_FILENAME].Value.ToString();
                int pdfPageIndex = int.Parse(item.Cells[COL_PAGE_NO].Value.ToStri开发者_如何学JAVAng());
                pdfPageIndex += 1;

                iTextSharp.text.pdf.PdfReader reader = new iTextSharp.text.pdf.PdfReader(pdfFileName);
                int pageCount = reader.NumberOfPages;

                // set page size for the documents
                doc.SetPageSize(reader.GetPageSizeWithRotation(1));

                iTextSharp.text.pdf.PdfImportedPage page = pdfCpy.GetImportedPage(reader, pdfPageIndex);
                pdfCpy.AddPage(page);

                reader.Close();
            }

            doc.Close();

            return true;
        }
        catch (Exception ex)
        {
            return false;
        }
    }

You're creating a new PdfReader for each pass. That's horribly inefficient. And because you've got a PdfImportedPage from each one, all those (probably redundant) PdfReader instances are never GC'ed.

Suggestions:

Two passes. First build a list of files & pages. Second operate on each file in turn, so you only ever have one PdfReader "open" at a time. Use PdfCopy.freeReader() when you're done with a given reader. This will almost certainly change the order in which your pages are added (maybe a Very Bad Thing).
One pass. Cache your PdfReader instances based on the file name. FreeReader again when you're done... but you probably won't be able to free any of them until you've dropped out of your loop. The caching alone may be enough to keep you from running out of memory.
Keep your code as is, but call freeReader() after you close a given PdfReader instance.

I haven't run into an OOM problems with iTextSharp. Are the PDFs created with iTextSharp or something else? Can you isolate the problem to a single PDF or a set of PDFs that might be corrupt? Below is sample code that creates 10 PDFs with 1,000 pages in each. Then it creates one more PDF and randomly pulls 1 page from those PDFs 500 times. On my machine it takes a little while to run but I don't see any memory issues or anything. (iText 5.1.1.0)

using System;
using System.Windows.Forms;
using System.IO;
using iTextSharp.text;
using iTextSharp.text.pdf;

namespace WindowsFormsApplication1
{
    public partial class Form1 : Form
    {
        public Form1()
        {
            InitializeComponent();
        }

        private void Form1_Load(object sender, EventArgs e)
        {
            //Folder that we will be working in

            string WorkingFolder = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "Big File PDF Test");

            //Base name of PDFs that we will be creating
            string BigFileBase = Path.Combine(WorkingFolder, "BigFile");

            //Final combined PDF name
            string CombinedFile = Path.Combine(WorkingFolder, "Combined.pdf");

            //Number of "large" files to create
            int NumberOfBigFilesToMakes = 10;

            //Number of pages to put in the files
            int NumberOfPagesInBigFile = 1000;

            //Number of pages to insert into combined file
            int NumberOfPagesToInsertIntoCombinedFile = 500;

            //Create our test directory
            if (!Directory.Exists(WorkingFolder)) Directory.CreateDirectory(WorkingFolder);

            //First step, create a bunch of files with a bunch of pages, hopefully code is self-explanatory
            for (int FileCount = 1; FileCount <= NumberOfBigFilesToMakes; FileCount++)
            {
                using (FileStream FS = new FileStream(BigFileBase + FileCount + ".pdf", FileMode.Create, FileAccess.Write, FileShare.Read))
                {
                    using (iTextSharp.text.Document Doc = new iTextSharp.text.Document(PageSize.LETTER))
                    {
                        using (PdfWriter writer = PdfWriter.GetInstance(Doc, FS))
                        {
                            Doc.Open();
                            for (int I = 1; I <= NumberOfPagesInBigFile; I++)
                            {
                                Doc.NewPage();
                                Doc.Add(new Paragraph("This is file " + FileCount));
                                Doc.Add(new Paragraph("This is page " + I));
                            }
                            Doc.Close();
                        }
                    }
                }
            }

            //Second step, loop around pulling random pages from random files

            //Create our output file
            using (FileStream FS = new FileStream(CombinedFile, FileMode.Create, FileAccess.Write, FileShare.Read))
            {
                using (Document Doc = new Document())
                {
                    using (PdfCopy pdfCopy = new PdfCopy(Doc, FS))
                    {
                        Doc.Open();

                        //Setup some variables to use in the loop below
                        PdfReader reader = null;
                        PdfImportedPage page = null;
                        int RanFileNum = 0;
                        int RanPageNum = 0;

                        //Standard random number generator
                        Random R = new Random();

                        for (int I = 1; I <= NumberOfPagesToInsertIntoCombinedFile; I++)
                        {
                            //Just to output our current progress
                            Console.WriteLine(I);

                            //Get a random page and file. Remember iText pages are 1-based.
                            RanFileNum = R.Next(1, NumberOfBigFilesToMakes + 1);
                            RanPageNum = R.Next(1, NumberOfPagesInBigFile + 1);

                            //Open the random file
                            reader = new PdfReader(BigFileBase + RanFileNum + ".pdf");
                            //Set the current page
                            Doc.SetPageSize(reader.GetPageSizeWithRotation(1));

                            //Grab a random page
                            page = pdfCopy.GetImportedPage(reader, RanPageNum);
                            //Add it to the combined file
                            pdfCopy.AddPage(page);

                            //Clean up
                            reader.Close();
                        }

                        //Clean up
                        Doc.Close();
                    }
                }
            }

        }
    }
}