How to create an XMl or a markup file from pdf using itext?_问答_开发者

How to create an XMl or a markup file from pdf using itext?

开发者 https://www.devze.com 2023-03-29 08:45 出处：网络

I want to create a Text file for this below pdf http://examples.itextpdf.com/results/part4/chapter16/with_font.pdf

I want to create a Text file for this below pdf

http://examples.itextpdf.com/results/part4/chapter16/with_font.pdf

output should be similar to::

<BaseFont:'WaltDisneyScriptv4.1'; Type:'None'; Size:'60'>iText in Action<End>

I could google and find how to extract/find fonts used in a pdf but not their size or type(i.e. bold/italic...) and relate font to the every text being used.

In case where different fonts are used the o/p should be like

Eg: <BaseFont:'Courier'; Type:'None'; Size:'45'>iText <End><BaseFont:'WaltDisneyScriptv4.1'; Type:'None'; Size:'60'>in Action<End>

Any assistance is appreciated. Thank开发者_Go百科s in advance!

Here is some code that I used to find the SET of fonts in a pdf.

public static void processResource(Map<String, String> set, PdfDictionary resource)
    {
        if (resource == null)
            return;
        PdfDictionary xobjects = resource.getAsDict(PdfName.XOBJECT);
        if (xobjects != null)
            {
                for (PdfName key : xobjects.getKeys())
                    {
                        processResource(set, xobjects.getAsDict(key));
                    }
            }
        PdfDictionary fonts = resource.getAsDict(PdfName.FONT);
        if (fonts == null)
            return;
        PdfDictionary font;
        for (PdfName key : fonts.getKeys())
            {
                font = fonts.getAsDict(key);
                String name = font.getAsName(PdfName.BASEFONT).toString();
                if (name.length() > 8 && name.charAt(7) == '+')
                    {
                        name = String.format("%s subset (%s)", name.substring(8), name.substring(1, 7));
                    }
                else
                    {
                        name = name.substring(1);
                        PdfDictionary desc = font.getAsDict(PdfName.FONTDESCRIPTOR);
                        if (desc == null)
                            name += " nofontdescriptor";
                        else if (desc.get(PdfName.FONTFILE) != null)
                            name += " (Type 1) embedded";
                        else if (desc.get(PdfName.FONTFILE2) != null)
                            name += " (TrueType) embedded";
                        else if (desc.get(PdfName.FONTFILE3) != null)
                            name += " (" + font.getAsName(PdfName.SUBTYPE).toString().substring(1) + ") embedded";
                    }
                set.put(font.getAsName(PdfName.NAME).toString(), name);
                // System.err.println(font.getAsName(PdfName.NAME) + " " + name);
            }
    }

You should be able to extend it to extract some font size information. Additionally, if there is not information in the Dictionary, then you can look at the raw postscript and get font information from that.