开发者

Copied text is gibberish when trying to copy text with different fonts from PDF created in Word

开发者 https://www.devze.com 2023-02-17 23:12 出处:网络
I need some help and understanding in the following case. I have two documents, both created in Word 2002 and CutePDF Writer (File -> Print).

I need some help and understanding in the following case. I have two documents, both created in Word 2002 and CutePDF Writer (File -> Print). Both documents contains the text: [i]Test this text.[/i] In one docment the font is Times New Roman and t开发者_如何学Pythonhe other one has Palantino as font. The documents: http://tricky.o3h.se/lajjtis/pdf/palantino_text.pdf

http://tricky.o3h.se/lajjtis/pdf/timesnewr_text.pdf

Now try to copy the text from both documents and paste it in Word or Notepad. The text from palantino_text.pdf is now jibberish. But the times new roman one is fine. How come? I realize I could change font but I dont want to do that. Is there some settings when generating the PDF document I am missing?

Please help


First, you're using a very old version of Ghostscript. It is 8.15, which was released more than 7 years ago. Try the current release (9.01), which has improved and matured a lot over this time.

Second, the reason for your problem is that the palantino document uses a so called "custom encoding" of the font in question, plus it is using a TrueType font (support of which came to Ghostscript only belated). Custom encodings are free to map the glyph names to almost any arbitrary place in their table of glyphs. A PDF viewer or interpreter can keep track of this potentially rather complicated re-mapping, a simple copy'n' paste operation cannot.

The timesnewr document uses the standard "Ansi encoding" for the embedded font which is a Type 1 font (support of which is very native to Ghostscript). Ansi encoding for fonts maps the glyph names to the drawn shapes (glyphs) in a well-known and well-defined way and copy'n'paste operations can work flawlessly with these.

If you upgrade your PDF writer/driver to a more recent one, you may be lucky and even get embedded TrueType fonts into a shape were you can copy'n'paste text snippets from the PDF. (And try to find the setting in your printer driver which tells it in which way to embed TrueType fonts. You may have luck and see the "Convert to Outlines" or "Convert to Type42" setting. Try these!)


I had the same problem after installing Win 8.1. I have print the mail from Outlook to CutePDF printer. After opening the *.pdf ... select ... copy ... go to Notepad ... paste ... gibberish occoured.

Checked the CutePDF site http://www.cutepdf.com/support/faq.asp They say: Text characters are wrong or missing in generated PDF file. On Win2000 and up boxes, select CutePDF Writer properties in the application print dialog box and click "Advanced". Select "Download as Softfont" (default is "Substitute with Device Fonts") on TrueType Fonts setting for font embedding. On Win98/ME boxes, open the property of CutePDF Writer and change the Fonts setting to "Always use TrueType fonts".

But: !!!!! Do not go to Advance from the program (Outlook) but use Control Panel / Devices and Printers / CutePDF / Properties / Settings / Advance / Font-TrueType: Download as Softfont

It looks like you can NOT change this settings from application dialog box properties (becouse you didn't run the program As administrator).

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号