开发者

How to create a PDF document from languages of Unicode char set regarding using third party Fonts

开发者 https://www.devze.com 2023-03-09 14:59 出处:网络
I\'m using PDFBox and iText to create a simple (just paragraphs) pdf document from various languages. Something like :

I'm using PDFBox and iText to create a simple (just paragraphs) pdf document from various languages. Something like :

pdfBox:

private static void createPdfBoxDocument(File from, Fil开发者_如何学JAVAe to) {
    PDDocument document = null;
    try {
        document = new TextToPDF().createPDFFromText(new FileReader(from));
        document.save(new FileOutputStream(to));
    } finally {
        if (document != null)
            document.close();
    }
}

private void createPdfBoxDoc() throws IOException, FileNotFoundException, COSVisitorException {
    PDDocument document = new PDDocument();
    PDPage page = new PDPage();
    document.addPage(page);
    PDPageContentStream contentStream = new PDPageContentStream(document, page);

    PDType1Font font = PDType1Font.TIMES_ROMAN;
    contentStream.setFont(font, 12);
    contentStream.beginText();
    contentStream.moveTextPositionByAmount(100, 400);
    contentStream.drawString("š");
    contentStream.endText();
    contentStream.close();
    document.save("test.pdf");
    document.close();
}

itext:

private static Font blackFont = new Font(Font.FontFamily.COURIER, 12, Font.NORMAL, BaseColor.BLACK);

private static void createITextDocument(File from, File to) {
    Document document = new Document();
    PdfWriter.getInstance(document, new FileOutputStream(to));
    document.open();
    addContent(document, getParagraphs(from));
    document.close();
}

private static void addContent(Document document, List<String> paragraphs) { 

    for (int i = 0; i < paragraphs.size(); i++) {
        document.add(new Paragraph(paragraphs.get(i), blackFont));
    }
}

The input files are encoded in UTF-8 and some languages of Unicode char set, like Russian alphabet etc., are not rendered properly in pdf. The Fonts in both libraries don't support Unicode charset I suppose and I can't find any documentation on how to add and use third party fonts. Could please anybody help me out with an example ?


If you are using iText, it has quite good support.

In iText in Action (chapter 2.2.2) you can read more.

You have to download some unicode Fonts like arialuni.ttf and do it like this :

    public static File fontFile = new File("fonts/arialuni.ttf");

    public static void createITextDocument(File from, File to) throws DocumentException, IOException {

        Document document = new Document();
        PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(to));
        document.open();
        writer.getAcroForm().setNeedAppearances(true);
        BaseFont unicode = BaseFont.createFont(fontFile.getAbsolutePath(), BaseFont.IDENTITY_H, BaseFont.EMBEDDED);

        FontSelector fs = new FontSelector();
        fs.addFont(new Font(unicode));

        addContent(document, getParagraphs(from), fs);
        document.close();
    }

    private static void addContent(Document document, List<String> paragraphs, FontSelector fs) throws DocumentException { 

        for (int i = 0; i < paragraphs.size(); i++) {
            Phrase phrase = fs.process(paragraphs.get(i));
            document.add(new Paragraph(phrase));
        }
    }

arialuni.ttf fonts work for me, so far I checked it support for

BG, ES, CS, DA, DE, ET, EL, EN, FR, IT, LV, LT, HU, MT, NL, PL, PT, RO, SK, SL, FI, SV

and only PDF in Romanian language wasn't created properly...

With PDFBox it's almost the same:

private void createPdfBoxDoc() throws IOException, FileNotFoundException, COSVisitorException {
    PDDocument document = new PDDocument();
    PDPage page = new PDPage();
    document.addPage(page);
    PDPageContentStream contentStream = new PDPageContentStream(document, page);

    PDFont font = PDTrueTypeFont.loadTTF(document, "fonts/arialuni.ttf");
    contentStream.setFont(font, 12);
    contentStream.beginText();
    contentStream.moveTextPositionByAmount(100, 400);
    contentStream.drawString("š");
    contentStream.endText();
    contentStream.close();
    document.save("test.pdf");
    document.close();
}

However as Gagravarr says, it doesn't work because of this issue PDFBOX-903 . Even with 1.6.0-SNAPSHOT version. Maybe trunk will work. I suggest you to use iText. It works there perfectly.


You may find this answer helpful - it confirms that you can't do what you need with one of the standard type 1 fonts, as they're Latin1 only

In theory, you just need to embed a suitable font into the document, which handles all your codepoints, and use that. However, there's at least one open bug with writing unicode strings, so there's a chance it might not work just yet... Try the latest pdfbox from svn trunk too though to see if it helps!


In my project, I just copied the font that supported UTF8 (or whatever language you want) to a directory (or you can used Windows fonts path) and add some code, it looked like this

BaseFont baseFont = BaseFont.createFont("c:\\a.ttf", BaseFont.IDENTITY_H,true);
Font font = new Font(baseFont);
document.add(new Paragraph("Not English Text",font));

Now, you can use this font to show your text in various languages.


//use this code.Sometimes setfont() willnot work with Paragraph

try
{

    FileOutputStream out=new FileOutputStream(name);

    Document doc=new Document();

    PdfWriter.getInstance(doc, out);

    doc.open();

    Font f=new Font(FontFamily.TIMES_ROMAN,50.0f,Font.UNDERLINE,BaseColor.RED);
    Paragraph p=new Paragraph("New PdF",f);

    p.setAlignment(Paragraph.ALIGN_CENTER);

    doc.add(p);
    doc.close();
    }
    catch(Exception e)
    {
        System.out.println(e);
    }
}
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号