开发者

svn or mercurial version control of word documents

开发者 https://www.devze.com 2023-03-13 06:36 出处:网络
As far as I know, Microsoft went to some sort of xml-based representation in their most recent version of office.If that\'s really true, then I would assume that version control would work, although y

As far as I know, Microsoft went to some sort of xml-based representation in their most recent version of office. If that's really true, then I would assume that version control would work, although you would obviously have to resolve any embedded changes with the old

<<<<<<

======

>>>>>>

marks in them before loading word.

This other question mentions the issue, but it seems to be taken as a foregone开发者_如何学运维 conclusion that version control simply won't work in Word, and I want to know why?

Is version control (ie. Subversion) applicable in document tracking?


There's the zipdoc extension for Mercurial, which seems to handle compressed files like XML-based Word documents by storing them uncompressed internally in order to get meaningful deltas and in order to merge them in a meaningful way. I did not test it, but it sounds like the thing you're looking for.


The foregone conclusion is that although most, if not all, version control systems, Mercurial included, does indeed work with binary files, they suck at diffing and merging them.

Word files are binary in nature. Yes, the latest incarnations of Office has switched to "Office Open XML" format, which includes XML, but they still wrap the entire thing in a zip file, which means it is still binary (and yes, I know that all files are in fact binary, you know what I mean.)

Now, many version control systems, both Mercurial and Subversion, can be told how to merge any file type it considers binary by giving it an external merge tool that can do the job.

This basically means that if you can find a program that can take two Word files, diff them, and allow you to reconcile differences, then you're in business.

If you unzipped the Word file, and versioned the contents, then yes, you could get merge conflicts that you can resolve through Mercurial, however the contents would still be in a format that you didn't write yourself, so reconciling difficult merge conflicts might not be just difficult, they might be impossible.

In short, version control systems excel at storing binary files, but they suck at diffing and merging them.

If you never need to diff or merge, you can use Mercurial or Subversion or whatever, and it will work just great.


The new formats are in fact XML based, however the .docx file itself is actually a zip file. So ultimately it is still a binary file...


I suppose it depends on who will be using the documents. Usually only developers are comfortable with using VCSs, so you may be complicating the lives of people who just want to access via a shared drive.

On the other hand, revision history is often very important, and I often see word documents with big summaries at the top, listing all of the changes, which seems really silly.

I'm think that cloud based solutions like google docs, will probably fill this gap in the future. Or maybe just a team wiki. Generally you are trading off some of the fancier word features to have a more open sharing experience, but google docs is becoming pretty powerful.


I'd put the Use Case in the foreground. Quite a lot of people in the world need tools to compare two versions of the same Word document - but they're not developers, but for example attorneys. At my law firm clients, documents go out to their clients and come back with edits, so a document-based-comparison is absolutely necessary. They use either the built-in Word comparison function, or third-party tools (WorkShare DeltaView is something like an industry standard). These tools allow also to compare PDF-documents.

The use case here is clearly content-driven: the attorneys need to get quickly an overview of the differences between two versions of a contract. Both versions can be stored in a document management system as "versions", or in the case of DeltaView, the delta file can be stored for further review.

What can be the use case for a developer? Source control systems mean "SOURCE" control, and not "control all stuff coming up in my project". I'd rather store project-related documents (Plans, Specs, Requirements, E-Mails) in another store, not in Mercurial. - On the other hand, I use often Word documents or Word templates as part of the solution in Document Template projects, and of course these documents are source - so saved in the repo. But the need to visualize differences was up to now relatively small, especially if your comments are good ("Version 1 - init", "Version 2: added textbox in header", "Version 3: added footer information" etc.).


Replies to various points or assumptions read here:

  • Yes, subversion does a very good job at diffing binary files. For excample, 60 versions of a 30Mb file take 90Mb for one of my documents with lots of pics.
  • Yes, Tortoise SVN automatically calls the native MSWord diff and thus, allows you to see the exact differences (including formats) between any two versions, at character level.
  • Consider using msWord Track Changes features instead of a posterio comparision, this will also keep track of moves, keep authors, etc. Answers different needs...
  • Yes, a docx file is a zipped directory with xml files. Try, just open a docx file with a zip utility or unzip it!
  • Consider saving in XML instead of docx, if you want keyword expansion:

  • Save your file as .xml instead of .docx; though your file gets much bigger (no longer zipped), you may save space with svn compression, more efficient on text than binaries, I expect.

  • Insert your snv keywords (e.g. $Rev$) in the properties of the word document (using File-Info, Properties in the right pane)
  • Display the info in your document using fields: Isert-Quick Parts-Document Property, for example

That seems to work for me.

Rodolphe


Depends on the setting.

If it's a short lived doc that you want to track changes in, then use the Word internal control.

Otherwise use SVN or Sharepoint or some other External means of recording versioned documents. If you don't you run the risk that anybody could overwrite the file with all the versioning information lost.

0

精彩评论

暂无评论...
验证码 换一张
取 消