开发者

Merge Two Half-Page PDF Documents with PHP

开发者 https://www.devze.com 2023-03-27 16:33 出处:网络
A Friend of mine works on a Newspaper and asked me this on monday and i couldn\'t confirm if it was possible or not.

A Friend of mine works on a Newspaper and asked me this on monday and i couldn't confirm if it was possible or not.

I know it's开发者_运维知识库 possible to merge 2 PDFs using PHP (as i've seen many other questions already answered), but what i'm not sure of is if i can merge a half-page PDF to fill a space in another PDF.

Imagine the following: i have PDF1: a Half Page PDF, and then i have a 3 pages PDF: Pdf2. In the first page of PDF2 i have a empty space to fit PDF1.

Can i do this? how?


I can't give you specific source code, but I can explain how to do it at the very low level. Also, what you're looking for is similar to what's called impositioning in the publishing industry.

You start out the same way as merging, which means pulling in pages from another document. You must bring in all dependencies of the page recursively. But watch out to avoid infinite loops, which do exist in PDF, so you must keep track of visited object. Don't use recursive functions, because your stack will easily overflow, PDF references can be very deep. You should implement the traversal recursion on the heap (Depth First Search is fine).

The key to stamping PDF on PDF is to turn the source Page object into an XObject form (not to be mixed with AcroForms or fillable form fields). An XObject form is very similar to a Page object, with the following exceptions:

  • The /Type /Page becomes /Type /XObject /Subtype /Form.
  • The page MediaBox and CropBox together become /BBox in the form. But be careful, both of them can be inherited via the page tree, so you must look for inherited attributes.
  • The page Rotate (also inheritable) becomes Matrix, which is a transformation (rotation) matrix, instead an angle.
  • The page's Resources, Group and Metadata can be brought in unchanged and added to the form object.
  • The page Contents stream must be transferred to the form. However, the page Contents is an external object, and may be an array, which means you need to merge the pieces. The XObject form is a stream object.
  • All other attributes are tricky, and you might want to ignore them if you are unsure.

Once this is done, all you have to do is paint the XObject form on the new page. You have to generate a unique name for the XObject and add it to the page's Resources. Painting itself is a series of a cm and a Do operators, just like painting an image. If you need to crop the original content, then you also need to set a clipping path before Do.

Needless to say, this is far from trivial, and there are lots of pitfalls. I have implemented this and I can tell you it really works, but it's harder than it seems. You must have a very good low level PDF library, and a very thorough understanding of the PDF specs.

I haven't discussed some of the other details, such as color management (what if you paint DeviceRGB on managed CMYK), PDF/A, PDF/X, transferring annotation and form fields, etc.

If this is beyond you, you should be looking for an open-source impositioning library, because it does pretty much the same. Impositioning means placing two or more pages on a blank sheet of paper, with the purpose of printing a book or a flyer. I do have a commercial solution as well.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号