开发者

PDFBox not recognizing a link

开发者 https://www.devze.com 2023-03-30 03:11 出处:网络
I\'m using Apache PDFBox to scan throu开发者_开发问答gh a PDF in search of links to a certain file.

I'm using Apache PDFBox to scan throu开发者_开发问答gh a PDF in search of links to a certain file.

I've got about a thousand PDF's to scan, and most of the links (in fact all but one as far as I can see now) are found.

However, there is one particular link in a PDF that PDFBox simply ignores. If I open the PDF with Foxit and check the link's properties, it looks exactly like all the other links (that do get found).

Here's the code I use to iterate through the links:

    for( Object p : pages ) {
        PDPage page = (PDPage)p;

        List<?> annotations = page.getAnnotations();
        for( Object a : annotations ) {
            PDAnnotation annotation = (PDAnnotation)a;

            if( annotation instanceof PDAnnotationLink ) {
                PDAnnotationLink link = (PDAnnotationLink)annotation;

                /* Do stuff with the link */
            }
        }

    }

In the affected PDF, page.getAnnotations() does return an empty list.

Is there any other type of link besides the annotations that I should be aware of?


I took a look at the annot dictionary. It looks like this:

<</A 1207 0 R/BS<</D[3.0]/S/D/Type/Border/W 0>>/Border[0 0 0[3.0]]/C[1.0 0.0 0.0]/H/I/Rect[56.4168 621.404 547.686 639.787]/Subtype/Link/Type/Annot>>

I can't see anything wrong with it. It is also referenced correctly from the Annots entry in the page. Sorry I cannot be of more help.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号