Given a page from a pdf document, I would like to be able to find the margin for the text, using objetive-C.
I realise there are already many questions relating to CGPDF...
, but I have not been able to find anything useful. I have also had a look at the PDF specification doc. I am sure it must be in there some开发者_开发百科where, but I have not been able to find it yet.
Example
I create a Word document which has a left and right margin of 2.5cm each. I then print to pdf. Taking this pdf, is there some way to figure out the width of the text (ie, the left and right page margin)?
Background
In case I am barking up the wrong tree, the reason I am asking this question is to be able to zoom like iBooks zooms. If you double tap on iBooks, it will take zoom you to the width of the main body. This is the same in the Mac's Preview application (pressing "Zoom to Fit").
First thoughts
I first thought that maybe PDF Boxes
(CGPDFPage
) like kCGPDFBleedBox
might be able to help, but it does not look like it will help in my case.
Update
I am only concerned with the body text of the page. Images etc, that might be outside this do not bother me.
Related posts
Fast and Lean PDF Viewer for iPhone / iPad / iOs - tips and hints?
I'm not familiar with Apple's "Zoom to Fit" feature and its exact behavior (though I can imagine its most important property)...
One potential disadvantage when relying on the different *Box values (MediaBox
, CropBox
, TrimBox
, BleedBox
and (the deprecated) ArtBox
) is, that the real white space may still be different (mostly bigger) from their returned values.
Ghostscript has a special device called bbox
which returns the "bounding box" of all the pages' rendered content. Example:
gswin32c.exe ^
-o nul: ^
-sDEVICE=bbox ^
input.pdf
returns (for a random 3 page example I tried this command with):
%%BoundingBox: 86 122 509 719
%%HiResBoundingBox: 86.993997 122.993996 508.013984 718.001978
%%BoundingBox: 103 199 152 271
%%HiResBoundingBox: 103.408098 199.998064 151.107956 270.897953
%%BoundingBox: 103 195 185 271
%%HiResBoundingBox: 103.208059 195.000041 184.000002 270.897953
You can probably ignore the high-precision HiResBoundingBox values. This leaves you with:
%%BoundingBox: 86 122 509 719
%%BoundingBox: 103 199 152 271
%%BoundingBox: 103 195 185 271
These four values represent the coordinates of the lower left and upper right corners or a rectangle which surrounds all rendered pixels. The units are PostScript points (72 points == 1 inch
).
Compare this to the *Box
values as returned by pdfinfo.exe
:
pdfinfo ^
-f 1 ^
-l 3 ^
-box ^
input.pdf
[....]
Page 1 size: 421 x 595 pts (A5)
Page 2 size: 421 x 595 pts (A5)
Page 3 size: 92 x 80 pts
Page 1 MediaBox: 0.00 0.00 595.00 842.00
Page 1 CropBox: 87.00 123.00 508.00 718.00
Page 1 BleedBox: 87.00 123.00 508.00 718.00
Page 1 TrimBox: 87.00 123.00 508.00 718.00
Page 1 ArtBox: 87.00 123.00 508.00 718.00
Page 2 MediaBox: 0.00 0.00 595.00 842.00
Page 2 CropBox: 87.00 123.00 508.00 718.00
Page 2 BleedBox: 87.00 123.00 508.00 718.00
Page 2 TrimBox: 87.00 123.00 508.00 718.00
Page 2 ArtBox: 87.00 123.00 508.00 718.00
Page 3 MediaBox: 0.00 0.00 595.00 842.00
Page 3 CropBox: 92.00 194.00 184.00 274.00
Page 3 BleedBox: 92.00 194.00 184.00 274.00
Page 3 TrimBox: 92.00 194.00 184.00 274.00
Page 3 ArtBox: 92.00 194.00 184.00 274.00
[...]
Update: Here is a screenshot showing the thumbnails of the PDF document's 3 pages which I used to demonstrate the differences above:
You can render the PDF page as a bitmap, detect its pixel status and get the white margins. Take a look at this excellent implementation from Skim: http://skim-app.svn.sourceforge.net/viewvc/skim-app/trunk/NSBitmapImageRep_SKExtensions.m?revision=7036&content-type=text%2Fplain
According to CGPDF documentation you can get up to four content boxes which define the area in which content is held, printed, cropped, trimmed and so on. Use CGPDFPageGetBoxRect()
function to get those boxes. I'm not sure of their exact purpose so this is just my guess on which boxes you need:
CGRect mediaBox = CGPDFPageGetBoxRect(pageRef, kCGPDFMediaBox);
CGRect cropBox = CGPDFPageGetBoxRect(pageRef, kCGPDFCropBox);
CGFloat leftMargin = CGRectGetMinX(cropBox) - CGRectGetMinX(mediaBox);
In other words - you get page boundaries, and content rectangle boundaries and do the math on them. Shouldn't be too hard once you get the idea of what each box represents.
精彩评论