开发者

How do I figure out the font family and the font size of the words in a pdf document?

开发者 https://www.devze.com 2022-12-30 00:18 出处:网络
How do I fi开发者_运维技巧gure out the font family and the font size of the words in a pdf document? We are actually trying to generate a pdf document programmatically using iText, but we are not sure

How do I fi开发者_运维技巧gure out the font family and the font size of the words in a pdf document? We are actually trying to generate a pdf document programmatically using iText, but we are not sure how to find out the font family and the font size of the original document which needs to be generated. document properties doesn't seem to contain this information


Fonts are stored in the catalog (I suppose in a sub-catalog of type font). If you open a pdf as a text file, you should be able to find catalog entries (they begin and end with "<<" and ">>" respectively.

On a simple pdf file, i found the following:

<</Type/Font/BaseFont/Helvetica-Bold/Subtype/Type1/Encoding/WinAnsiEncoding>>

thus searching for the prefix should help you (in some pdf files, there are spaces between the commponents but '/Type /Font' should be ok).

Of course this is a manual process, while you would probably prefer an automatic one.

On another note, we sometime use identifont or what the font to find uncommon fonts that give us problem (logo font).

regards Guillaume

Edit : the following code will find all font in the pages. To be short, you search the dictionnary of each page for the subdictionnary "ressource" and then the subdictionnary "font". Each entry in the later is a font dictionnary, describing a font.

 PdfReader reader = new PdfReader(
   new FileInputStream(new File("file.pdf")));
 int nbmax = reader.getNumberOfPages();
 System.out.println("nb pages " + nbmax);

 for (int i = 1; i <= nbmax; i++) {
    System.out.println("----------------------------------------");
    System.out.println("Page " + i);
    PdfDictionary dico = reader.getPageN(i);
    PdfDictionary ressource = dico.getAsDict(PdfName.RESOURCES);
    PdfDictionary font = ressource.getAsDict(PdfName.FONT);
    // we got the page fonts
    Set keys = font.getKeys();
    Iterator it = keys.iterator();
    while (it.hasNext()) {
       PdfName name = (PdfName) it.next();
       PdfDictionary fontdict = font.getAsDict(name);
       PdfObject typeFont = fontdict.getDirectObject(PdfName.SUBTYPE);
       PdfObject baseFont = fontdict.getDirectObject(PdfName.BASEFONT);               
       System.out.println(baseFont.toString());              
    }
 }

The name (variable "name" in the following code) is what is used in the text to change font. In the PDF, you'll have to find it next to a text. The following number is the size. Here for example, it's size 12. (sorry, still no code for this part).

BT 
/F13  12  Tf 
288  720  Td 
the text to find  Tj 
ET


Depending on the PDF, if it hasn't been outlined you may be able to open it in Adobe Illustrator, double click the text and select some of it to see it's font family, size, etc.

If the text is outlined then use one of those online tools that PATRY suggests to find out the font.

Good luck


If you have Adobe Acrobat you can see the fonts inside and examine the objects and text streams. I wrote a blog post on this at http://pdf.jpedal.org/java-pdf-blog/bid/10479/Viewing-PDF-objects

0

精彩评论

暂无评论...
验证码 换一张
取 消