itextsharp trimming pdf document's pages_问答_开发者

I have a pdf document that has f开发者_开发知识库orm fields that I'm filling out programatically with c#. Depending on three conditions, I need to trim (delete) some of the pages from that document.

Is that possible to do?

for condition 1: I need to keep pages 1-4 but delete pages 5 and 6

for condition 2: I need to keep pages 1-4 but delete 5 and keep 6

for condition 3: I need to keep pages 1-5 but delete 6

Use PdfReader.SelectPages() combined with PdfStamper. The code below uses iTextSharp 5.5.1.

public void SelectPages(string inputPdf, string pageSelection, string outputPdf)
{
    using (PdfReader reader = new PdfReader(inputPdf))
    {
        reader.SelectPages(pageSelection);

        using (PdfStamper stamper = new PdfStamper(reader, File.Create(outputPdf)))
        {
            stamper.Close();
        }
    }
}

Then you call this method with the correct page selection for each condition.

Condition 1:

SelectPages(inputPdf, "1-4", outputPdf);

Condition 2:

SelectPages(inputPdf, "1-4,6", outputPdf);

SelectPages(inputPdf, "1-6,!5", outputPdf);

Condition 3:

SelectPages(inputPdf, "1-5", outputPdf);

Here's the comment from the iTextSharp source code on what makes up a page selection. This is in the SequenceList class which is used to process a page selection:

/**
* This class expands a string into a list of numbers. The main use is to select a
* range of pages.
* <p>
* The general systax is:<br>
* [!][o][odd][e][even]start-end
* <p>
* You can have multiple ranges separated by commas ','. The '!' modifier removes the
* range from what is already selected. The range changes are incremental, that is,
* numbers are added or deleted as the range appears. The start or the end, but not both, can be ommited.
*/

Instead of deleting pages in a document what you actually do is create a new document and only import the pages that you want to keep. Below is a full working WinForms app that does that (targetting iTextSharp 5.1.1.0). The last parameter to the function removePagesFromPdf is an array of pages to keep.

The code below works off of physical files but would be very easy to convert to something based on streams so that you don't have to write to disk if you don't want to.

using System;
using System.ComponentModel;
using System.IO;
using System.Linq;
using System.Windows.Forms;
using iTextSharp.text.pdf;
using iTextSharp.text;


namespace Full_Profile1
{
    public partial class Form1 : Form
    {
        public Form1()
        {
            InitializeComponent();
        }

        private void Form1_Load(object sender, EventArgs e)
        {
            //The files that we are working with
            string sourceFolder = Environment.GetFolderPath(Environment.SpecialFolder.Desktop);
            string sourceFile = Path.Combine(sourceFolder, "Test.pdf");
            string destFile = Path.Combine(sourceFolder, "TestOutput.pdf");

            //Remove all pages except 1,2,3,4 and 6
            removePagesFromPdf(sourceFile, destFile, 1, 2, 3, 4, 6);
            this.Close();
        }
        public void removePagesFromPdf(String sourceFile, String destinationFile, params int[] pagesToKeep)
        {
            //Used to pull individual pages from our source
            PdfReader r = new PdfReader(sourceFile);
            //Create our destination file
            using (FileStream fs = new FileStream(destinationFile, FileMode.Create, FileAccess.Write, FileShare.None))
            {
                using (Document doc = new Document())
                {
                    using (PdfWriter w = PdfWriter.GetInstance(doc, fs))
                    {
                        //Open the desitination for writing
                        doc.Open();
                        //Loop through each page that we want to keep
                        foreach (int page in pagesToKeep)
                        {
                            //Add a new blank page to destination document
                            doc.NewPage();
                            //Extract the given page from our reader and add it directly to the destination PDF
                            w.DirectContent.AddTemplate(w.GetImportedPage(r, page), 0, 0);
                        }
                        //Close our document
                        doc.Close();
                    }
                }
            }
        }
    }
}

Here is the code I use to copy all but the last page of an existing PDF. Everything is in memory streams. The variable pdfByteArray is a byte[] of the original pdf obtained using ms.ToArray(). pdfByteArray is overwritten with the new PDF.

        PdfReader originalPDFReader = new PdfReader(pdfByteArray);

        using (MemoryStream msCopy = new MemoryStream())
        {
           using (Document docCopy = new Document())
           {
              using (PdfCopy copy = new PdfCopy(docCopy, msCopy))
              {
                 docCopy.Open();
                 for (int pageNum = 1; pageNum <= originalPDFReader.NumberOfPages - 1; pageNum ++)
                 {
                    copy.AddPage(copy.GetImportedPage(originalPDFReader, pageNum ));
                 }
                 docCopy.Close();
              }
           }

           pdfByteArray = msCopy.ToArray();

I know it's an old post, Simply I extend the @chris-haas solution to the next level.

Delete the selected pages after that save them into the separate pdf file.

//ms is MemoryStream and fs is FileStream

ms.CopyTo(fs);

Save the Stream to a separate pdf file. 100% working without any error.

pageRange="5"

pageRange="2,15-20"

pageRange="1-5,15-20"

You can pass the pageRange vales like the above-given samples.

private void DeletePagesNew(string pageRange, string SourcePdfPath, string OutputPdfPath, string Password = "")
{
    try
    {
        var pagesToDelete = new List<int>();

        if (pageRange.IndexOf(",") != -1)
        {
            var tmpHold = pageRange.Split(',');

            foreach (string nonconseq in tmpHold)
            {

                if (nonconseq.IndexOf("-") != -1)
                {
                    var rangeHold = nonconseq.Split('-');

                    for (int i = Convert.ToInt32(rangeHold[0]), loopTo = Convert.ToInt32(rangeHold[1]); i <= loopTo; i++)
                        pagesToDelete.Add(i);
                }
                else
                {
                    pagesToDelete.Add(Convert.ToInt32(nonconseq));
                }
            }
        }

        else if (pageRange.IndexOf("-") != -1)
        {
            var rangeHold = pageRange.Split('-');

            for (int i = Convert.ToInt32(rangeHold[0]), loopTo1 = Convert.ToInt32(rangeHold[1]); i <= loopTo1; i++)
                pagesToDelete.Add(i);
        }
        else
        {
            pagesToDelete.Add(Convert.ToInt32(pageRange));
        }

        var Reader = new PdfReader(SourcePdfPath);
        int[] pagesToKeep;
        pagesToKeep = Enumerable.Range(1, Reader.NumberOfPages).ToArray();

        using (var ms = new MemoryStream())
        {

            using (var fs = new FileStream(OutputPdfPath, FileMode.Create, FileAccess.Write, FileShare.None))
            {

                using (var doc = new Document())
                {

                    using (PdfWriter w = PdfWriter.GetInstance(doc, fs))
                    {
                        doc.Open();

                        foreach (int p in pagesToKeep)
                        {
                            if (pagesToDelete.FindIndex(s => s == p) != -1)
                            {
                                continue;
                            }

                            // doc.NewPage()
                            // w.DirectContent.AddTemplate(w.GetImportedPage(Reader, p), 0, 0)
                            // 
                            doc.SetPageSize(Reader.GetPageSize(p));
                            doc.NewPage();
                            PdfContentByte cb = w.DirectContent;
                            PdfImportedPage pageImport = w.GetImportedPage(Reader, p);
                            int rot = Reader.GetPageRotation(p);

                            if (rot == 90 || rot == 270)
                            {
                                cb.AddTemplate(pageImport, 0, -1.0f, 1.0f, 0, 0, Reader.GetPageSizeWithRotation(p).Height);
                            }
                            else
                            {
                                cb.AddTemplate(pageImport, 1.0f, 0, 0, 1.0f, 0, 0);
                            }

                            cb = default;
                            pageImport = default;
                            rot = default;
                        }

                        ms.CopyTo(fs);
                        fs.Flush();
                        doc.Close();
                    }
                }

            }
        }

        pagesToDelete = null;
        Reader.Close();
        Reader = default;
    }

    catch (Exception ex)
    {
        MessageBox.Show(ex.Message);

    }
}