开发者

List fonts used by a Word Document (faster method)

开发者 https://www.devze.com 2023-02-15 20:39 出处:网络
I am working on a process for validating documents to make sure that they meet corporate standards.One o开发者_JAVA百科f the steps is to make sure that the Word document does not use non-approved font

I am working on a process for validating documents to make sure that they meet corporate standards. One o开发者_JAVA百科f the steps is to make sure that the Word document does not use non-approved fonts.

I have the following stub of code, which works:

    Dim wordApplication As Word.ApplicationClass = New Word.ApplicationClass()
    Dim wordDocument As Word.Document = Nothing

    Dim fontList As New List(Of String)()

    Try
        wordDocument = wordApplication.Documents.Open(FileName:="document Path")
        'I've also tried using a for loop with an integer counter, no change in speed'
        For Each c As Word.Range In wordDocument.Characters
            If Not fontList.Contains(c.Font.Name) Then
                fontList.Add(c.Font.Name)
            End If
        Next

But this is incredibly slow! Incredibly slow = 2500 characters/minute (I timed it with StopWatch). Most of my files are around 6,000 words/30,000 characters (about 25 pages). But there are some documents that are in the 100's of pages...

Is there a faster way of doing this? I have to support Office 2003 format files, so the Open XML SDK isn't an option.

--UPDATE--

I tried running this as a Word macro (using the code found @ http://word.tips.net/Pages/T001522_Creating_a_Document_Font_List.html) and it runs much faster (under a minute). Unfortunately for my purposes I don't believe a Macro will work.

--UPDATE #2--

I took Chris's advice and converted the document to Open XML format on the fly. I then used the following code to find all RunFonts objects and read the font name:

    Using docP As WordprocessingDocument = WordprocessingDocument.Open(tmpPath, False)
        Dim runFonts = docP.MainDocumentPart.Document.Descendants(Of RunFonts)().Select(
                            Function(c) If(c.Ascii.HasValue, c.Ascii.InnerText, String.Empty)).Distinct().ToList()

        fontList.AddRange(runFonts)
    End Using


You might have to support Office 2003 but that doesn't mean you have to parse it in that format. Take the Office 2003 documents, temporarily convert them to DOCX files, open that as a ZIP file, parse the /word/fontTable.xml file and then delete the DOCX.


Another way I found without coding is :

  • export document as PDF
  • open it in adobe reader
  • in adobe reader go to : file menu\properties and then fonts tab, which lists the family fonts and sub-fonts those used in document.

Even maybe Developers and Programmers could use this procedure to code it and take out PDF Font list to what could be useful for more people.


You can speed things up a lot by iterating over paragraphs. Only if a paragraph contains mixed fonts would you need to check character by character. The Name, Bold, Italic, etc. properties have a special "indeterminate" value, which is an empty string for the Name or 9999999 for the style attributes.

So, for example, if Bold = 9999999 it means the paragraph contains some bold and some non-bold characters.

I include the following fragment to show the general idea:

For Each P as Paragraph in doc.Paragraphs
    Dim R as Range = P.Range
    If R.Font.Name = "" Or R.Font.Bold = 9999999 Or R.Font.Italic = 9999999
        Or R.Font.Underline = 9999999 Or R.Font.Strikethrough = 9999999 Then
        ' This paragraph uses mixed fonts, so we need to analyse character by character
        AnalyseCharacterByCharacter(R)
    Else
        ' R.Font is used throughout this paragraph
        FontHasBeenUsed(R.Font)
    End If
 Next


That's the wrong way round I think. We are looking for the fact of a font's inclusion not the location of that font. It's an existential rather than a positional problem.

Much, much, much quicker is to iterate the fonts. Only trick is that Word is sometimes fussy about spaces and so forth. This works well for me

Sub FindAllFonts()
    Dim lWhichFont As Long, sTempName As String, sBuffer As String
    For lWhichFont = 1 To FontNames.Count
       sTempName = FontNames(lWhichFont)
       If FindThisFont(sTempName) Then
          sBuffer = sBuffer & "Found " & sTempName & vbCrLf
        Else
           If FindThisFont(Replace(sTempName, " ", "")) Then
              sBuffer = sBuffer & "Found " & sTempName & " *" & vbCrLf
           End If
        End If
   Next
   Documents.Add
   Selection.TypeText Text:=sBuffer
End Sub

Function FindThisFont(sName As String) As Boolean
   Selection.HomeKey Unit:=wdStory
   Selection.Find.ClearFormatting
   With Selection.Find
       .Font.Name = sName
       .Forward = True
       .Format = True
       .Execute
       If .Found() Then
          FindThisFont = True
       Else
          FindThisFont = False
       End If
   End With
End Function

It works very fast (the only slow component is the font iteration)

(It won't find fonts not on your system, obviously, but if you are trying to prepare for transport something you wrote, and some assistant program has put Helvetica or MS Minchin in, you can find it)

OK, people told me that this was not what everyone wants, people want to find fonts that aren't on their machines. But the other way is still too slow and involves looking for a lot of stuff not there. So here is an alternative that saves out as rtf, and processes the rtf header.

Sub FIndAllFonts2()
    Dim sOldName As String, sOldPath As String, sBuffer As String, sBuffer2 As String, sOut As String, sBit
    Dim lCounter As Long, lPos As Long, lPos2 As Long, lStopAt As Long
    Dim objPic As InlineShape, objShp As Shape
    ' rememer old name for reloading
    sOldName = ActiveDocument.Path
    'delete image to make file out small
    For Each objPic In ActiveDocument.InlineShapes: objPic.Delete: Next
    For Each objShp In ActiveDocument.Shapes: objShp.Delete: Next
    ActiveDocument.SaveAs "tempout.rtf", wdFormatRTF
    sTempFile = ActiveDocument.Path
    ActiveDocument.Close
    lPos2 = 1
    ' we only want the header, but we don't know how long the file is
    'I am using a Mac, so filesystemobject not available
    ' if you end up having a huge header, make 2500 bigger
    lStopAt = 2500
    Open sTempFile For Input As #1
    Do While Not EOF(1) And lPos2 < lStopAt
        sBit = Input(1, #1)
        sBuffer = sBuffer & sBit
        lPos2 = lPos2 + 1
    Loop
    Close #1
    'delete temp file
    Kill sTempFile
    ' loop through header, fonts identified in the table as {\f1\
    ' if you have more than 100 fonts, make this bigger
    ' not all numbers are used
    lStopAt = 100
    For lCounter = 1 To lStopAt
        lPos = InStr(sBuffer, "{\f" & lCounter & "\")
        If lPos > 0 Then
            sBuffer = Mid(sBuffer, lPos)
            lPos = InStr(sBuffer, ";")
            sBuffer2 = Left(sBuffer, lPos - 1)
            'this is where you would look for the alternate name if you want it
            lPos = InStr(sBuffer2, "{\*\falt")
            If lPos > 0 Then
                sBuffer2 = Left(sBuffer2, lPos - 1)
                sBuffer2 = Mid(sBuffer2, InStrRev(sBuffer2, " ") + 1) & " *" 'indicate it is the shorter ascii name
            Else
                sBuffer2 = Mid(sBuffer2, InStrRev(sBuffer2, "}") + 1)
            End If
            sOut = sOut & sBuffer2 & vbCrLf
        End If
    Next
    'reopen old file
    Documents.Open sOldName
    Set newdoc = Documents.Add
    sOut = "Fonts in use in document " & sOldName & vbCrLf & sOut
    Selection.TypeText Text:=sOut
End Sub

This goes through my 350 page thesis draft in under 20 seconds on a MacBook Pro. So it is quick enough to be useful.


If you want to get all fonts used within your doc. you could simply get all of them through one line using OPEN XML:

 using (WordprocessingDocument doc = WordprocessingDocument.Open(filePath, true))
 {
     var fontlst = doc.MainDocumentPart.FontTablePart.Fonts.Elements<Font>();
 }

Each Font element has its "Name" property which is referenced in element in the properties of a text run.

Hint: you have to consider that each word doc. does not have more than 2 Font table parts, one in main part and the other in glossary part so don't forget to consider also glossary one if needed.

You could download OPEN XML SDK from here


This might be quicker than converting documents to .docx before processing them with OpenXml (for the record, you could also work with the property document.Content.WordOpenXML instead of document.Content.XML):

using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Xml.Linq;
using Word = NetOffice.WordApi;

namespace _5261108
{
    class Program
    {
        static void Main(string[] args)
        {
            using (var app = new Word.Application())
            {
                var doc = app.Documents.Open(Path.GetFullPath("test.docx"));

                foreach (var font in GetFontNames(doc))
                {
                    Console.WriteLine(font);
                }

                app.Quit(false);
            }

            Console.ReadLine();
        }

        private static IEnumerable<string> GetFontNames(Word.Document document)
        {
            var xml = document.Content.XML;
            var doc = XDocument.Parse(xml);
            var fonts = doc.Descendants().First(n => n.Name.LocalName == "fonts").Elements().Where(n => n.Name.LocalName == "font");
            var fontNames = fonts.Select(f => f.Attributes().First(a => a.Name.LocalName == "name").Value);
            return fontNames.Distinct();
        }
    }
}

Converted for your convenience:

Imports System.Collections.Generic
Imports System.IO
Imports System.Linq
Imports System.Xml.Linq
Imports Word = NetOffice.WordApi

Namespace _5261108
    Class Program
        Private Shared Sub Main(args As String())
            Using app = New Word.Application()
                Dim doc = app.Documents.Open(Path.GetFullPath("test.docx"))

                For Each font As var In GetFontNames(doc)
                    Console.WriteLine(font)
                Next

                app.Quit(False)
            End Using

            Console.ReadLine()
        End Sub

        Private Shared Function GetFontNames(document As Word.Document) As IEnumerable(Of String)
            Dim xml = document.Content.XML
            Dim doc = XDocument.Parse(xml)
            Dim fonts = doc.Descendants().First(Function(n) n.Name.LocalName = "fonts").Elements().Where(Function(n) n.Name.LocalName = "font")
            Dim fontNames = fonts.[Select](Function(f) f.Attributes().First(Function(a) a.Name.LocalName = "name").Value)
            Return fontNames.Distinct()
        End Function
    End Class
End Namespace

'=======================================================
'Service provided by Telerik (www.telerik.com)
'Conversion powered by NRefactory.
'Twitter: @telerik
'Facebook: facebook.com/telerik
'=======================================================


Try this:

Sub Word_Get_Document_Fonts()
  Dim report As String
  Dim J As Integer
  Dim font_name As String
  report = "Fonts in use in this document:" & vbCr & vbCr
  For J = 1 To FontNames.Count
    font_name = FontNames(J)
    Set myrange = ActiveDocument.Range
    myrange.Find.ClearFormatting
    myrange.Find.Font.Name = font_name
    With myrange.Find
      .text = "^?"
      .Replacement.text = ""
      .Forward = True
      .Wrap = wdFindStop
      .Format = True
    End With
    myrange.Find.Execute
    If myrange.Find.Found Then
      report = report & font_name & vbCr
    End If
  Next J
  MsgBox (report)
End Sub
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号