开发者

CGPDFScanner, Identity-H and decompression

开发者 https://www.devze.com 2023-03-06 21:49 出处:网络
My instance of CGPDFScanner is scanning a test pdf file. At a given time, the current font dictionary has Encoding value Identity-H and aFontDescriptor dictionary with key FontFile2. This key happens

My instance of CGPDFScanner is scanning a test pdf file.

At a given time, the current font dictionary has Encoding value Identity-H and a FontDescriptor dictionary with key FontFile2. This key happens to be for a stream value, whose dictionary has the key Filter. The value for this key is Fla开发者_Go百科teDecode.

I'm unsure of how to interpret and use this (to, say, extract the text in the next Tj block to Unicode). For example, do I just zlib-decompress the bytes in the next Tj block? (There is no ToUnicode key here.)

I'd thought all the decompression was carried out by the instance of CGPDFScanner.


If the font uses Identity-H encoding and it does not have a ToUnicode entry, the text cannot be extracted. The parameter of Tj operator is a sequence of glyph indexes and this sequence cannot be converted to text in the absence of the ToUnicode entry.

The FontFile2 entry stores the actual font file, it has no role when extracting text from the PDF file.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号