Scenario: I have about 14000 word documents that need to be converted from "Microsoft Word 97 - 2003 Document" to "Microsoft Word Document". In other words upgraded to 2010 format (.docx).
Question: Is there an easy way to do this using API's or something?
Note: I've only been able to find a microsoft program that converts the documents to .docx but they still open in compatability mode. It would be nice if they could just be converted to the new format. Same functionality you get when you open an old document and it gives you the option to convert it.
Edit: Just found http://msdn.microsoft.com/en-us/library/microsoft.office.interop.word._document.convert.aspx looking into how to use it.
EDIT2: This is my current function for converting t开发者_如何学Che documents
Private Sub btnConvert_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles btnConvert.Click
FolderBrowserDialog1.ShowDialog()
Dim mainThread As Thread
If Not String.IsNullOrEmpty(FolderBrowserDialog1.SelectedPath) Then
lstFiles.Clear()
DirSearch(FolderBrowserDialog1.SelectedPath)
ThreadPool.SetMaxThreads(1, 1)
lstFiles.RemoveAll(Function(y) y.Contains(".docx"))
TextBox1.Text += "Conversion started at " & DateTime.Now().ToString & Environment.NewLine
For Each x In lstFiles
ThreadPool.QueueUserWorkItem(New WaitCallback(AddressOf ConvertDoc), x)
Next
End If
End Sub
Private Sub ConvertDoc(ByVal path As String)
Dim word As New Microsoft.Office.Interop.Word.Application
Dim doc As Microsoft.Office.Interop.Word.Document
word.Visible = False
Try
Debug.Print(path)
doc = word.Documents.Open(path, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing)
doc.Convert()
Catch ex As Exception
''do nothing
Finally
doc.Close()
word.Quit()
End Try
End Sub`
It lets me select a path then find all doc files within the subfolders. That code isn't important, all the files for conversion are in lstFiles. Only problem at the moment is that it takes a really long time to convert even just 10 documents. Should I be using one word application per document instead of reusing it? Any suggestions?
Also it opens word after about 2 or 3 conversions and starts flashing but keeps converting.
EDIT3: Tweaked to code above a little bit and it runs cleaner. Takes 1min10sec to convert 8 files though. Considering I have 14000 I need to convert this method will take a reasonably long time.
EDIT4: Changed the code up again. Uses a threadpool now. Seems to run a bit faster. Still need to run on a better computer to convert all the documents. Or do them slowly by folder. Can anyone think of any other way to optimize this?
I ran your code locally, with just some minor edits for improved tracing and timing, and it "only" took 13.73 seconds to do 12 files. That would take care of your 14,000 in about 4 hours. I'm running Visual Studio 2010 on Windows 7 x64 with a dual core processor. Perhaps you can just use a faster computer?
Here's my full code, this is just a form with a single button, Button1, and a FolderBrowserDialog, FolderBrowserDialog1:
Imports System.IO
Public Class Form1
Dim lstFiles As List(Of String) = New List(Of String)
Private Sub DirSearch(path As String)
Dim thingies = From file In Directory.GetFiles(path) Where file.EndsWith(".doc") Select file
lstFiles.AddRange(thingies)
For Each subdir As String In Directory.GetDirectories(path)
DirSearch(subdir)
Next
End Sub
Private Sub Button1_Click(sender As System.Object, e As System.EventArgs) Handles Button1.Click
FolderBrowserDialog1.ShowDialog()
If Not String.IsNullOrEmpty(FolderBrowserDialog1.SelectedPath) Then
lstFiles.Clear()
DirSearch(FolderBrowserDialog1.SelectedPath)
Dim word As New Microsoft.Office.Interop.Word.Application
Dim doc As Microsoft.Office.Interop.Word.Document
lstFiles.RemoveAll(Function(y) y.Contains(".docx"))
Dim startTime As DateTime = DateTime.Now
Debug.Print("Timer started at " & DateTime.Now().ToString & Environment.NewLine)
For Each x In lstFiles
word.Visible = False
Debug.Print(x + Environment.NewLine)
doc = word.Documents.Open(x)
doc.Convert()
doc.Close()
Next
word.Quit()
Dim endTime As DateTime = DateTime.Now
Debug.Print("Took " & endTime.Subtract(startTime).TotalSeconds & " to process " & lstFiles.Count & " documents" & Environment.NewLine)
End If
End Sub
End Class
Use word automation and open it and save it with the WdSaveFormat enumeration for wdFormatDocumentDefault which should be docx
http://msdn.microsoft.com/en-us/library/microsoft.office.interop.word.wdsaveformat%28v=office.14%29.aspx
or try your hand at the Convert method you mentioned. Either way 100% possible and should be fairly easy.
Edit: if the converter Daniel posted works, thats far easier and he deserves all the credit : )
You can use the free Office File Converter.
Here explains the settings:
http://technet.microsoft.com/en-us/library/cc179019.aspx
There is a file list setting.
try this:
using Microsoft.Office.Interop
Microsoft.Office.Interop.Word.ApplicationClass word = new ApplicationClass();
object nullvalue = Type.Missing;
object filee = filename;
object file2 = String.Format("{0}{1}", filepath, "convertedfile.doc");
Microsoft.Office.Interop.Word.Document doc = word.Documents.Open(ref filee, ref nullvalue, ref nullvalue, ref nullvalue, ref nullvalue, ref nullvalue, ref nullvalue, ref nullvalue, ref nullvalue, ref nullvalue, ref nullvalue, ref nullvalue, ref nullvalue, ref nullvalue, ref nullvalue, ref nullvalue);
doc.SaveAs(ref file2, ref nullvalue, ref nullvalue, ref nullvalue, ref nullvalue, ref nullvalue, ref nullvalue, ref nullvalue, ref nullvalue, ref nullvalue, ref nullvalue, ref nullvalue, ref nullvalue, ref nullvalue, ref nullvalue, ref nullvalue);
精彩评论