开发者

Reading PDF Annotations with iText

开发者 https://www.devze.com 2023-02-20 17:59 出处:网络
I trying to get the contents of a PDF annotation to string so I can store that information in a database for searching purposes.

I trying to get the contents of a PDF annotation to string so I can store that information in a database for searching purposes.

Does anyone know h开发者_开发百科ow to accomplish this using iText/iTextSharp?


Yes, but the specifics really depend on what kind[s] of annotations you're talking about.

In general:

PdfDictionary pageDict = myPdfReader.getPageN(firstPageIsOne);

PdfArray annotArray = pageDict.getAsArray(PdfName.ANNOTS);

for (int i = 0; i < annotArray.size(); ++i) {
  PdfDictionary curAnnot = annotArray.getAsDict(i);
  
  int someType = myCodeToGetAnAnnotsType(curAnnot);
  if (someType == THIS_TYPE) {
    writeThisType(curAnnot);
  } else if (someType == THAT_TYPE) {
    writeThatType(curAnnot);
  }
}

For details, you'll need to examine the PDF Specification, in particular the annotation descriptions: "Chapter 12.5.6 Annotation Types".

If you can tell us what types you care about, I can be of more help.


For future reference to anyone that finds this question via Google like I did...

If what you want to do is find sticky note annotations name and contents you can do this (based in part on Mark's answer)

PdfReader reader = new PdfReader(somePDF);
PdfDictionary pageDict = reader.GetPageN(1);

PdfArray annotArray = pageDict.GetAsArray(PdfName.ANNOTS);

for (int i = 0; i < annotArray.Size; ++i)
{
    PdfDictionary curAnnot = annotArray.GetAsDict(i);

    PdfString name = curAnnot.GetAsString(PdfName.T);
    PdfString contents = curAnnot.GetAsString(PdfName.CONTENTS);
    if (!string.IsNullOrWhiteSpace(name?.ToString()))
    { Console.WriteLine(name); }
    if (!string.IsNullOrWhiteSpace(contents?.ToString()))
    { Console.WriteLine(contents); }
}

Additionally, to help identify what things you might be looking for you can open a PDF in a text editor and look for /annot and you'll quickly find your annotation object.


try
{
    PdfReader reader = new PdfReader(@"D:\Books_Dir\101\101.pdf");
    for (int i = 1; i <= reader.NumberOfPages; i++)
    {
        PdfDictionary pdfDictionary = reader.GetPageN(i);
        PdfArray annotsArray = pdfDictionary.GetAsArray(PdfName.ANNOTS);

        if (annotsArray != null)
            for (int j = 0; j < annotsArray.Size; j++)
            {
                PdfDictionary annot = annotArray.GetAsDict(j);
                int type = annot.Type;
                PdfString name = annot.GetAsString(PdfName.T);
                PdfString contents = annot.GetAsString(PdfName.CONTENTS);
            }
    }
}
catch (Exception ex)
{

}
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号