I need to traverse nodes between a bookmark start and a bookmark end tag. The problem appears to break down into a tree traversal but I am having trouble pinning down the correct algorithm. The bookmark start and end elements are non-composite nodes (no children) and may appear at an arbitrary depth in the tre开发者_如何学Ce. Bookmark start are also not guaranteed to be a at the same depth.
If you draw the tree structure for the document I would want to examine all nodes between the start and end bookmark. I think an algorithm to traverse an unbalanced tree starting at node x and ending at node y would work. Does this sounds feasible or am I missing something.
If this is feasible could you point me in the direction of a tree traversal that could accomplish returning the nodes?
This depends on what you want to do, however, if you are primarily interested in the text between two bookmarks, then this is one of those cases where XmlDocument / XPath semantics are easier to use than LINQ to XML or the strongly-typed object model of the Open XML SDK V2. The semantics of the 'following::*' axis of XPath is what you want. The following example uses XmlDocument and XPath to print the names of the nodes between the start and end of a bookmark.
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using System.Xml;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Wordprocessing;
class Program
{
public static XmlDocument GetXmlDocument(OpenXmlPart part)
{
XmlDocument xmlDoc = new XmlDocument();
using (Stream partStream = part.GetStream())
using (XmlReader partXmlReader = XmlReader.Create(partStream))
xmlDoc.Load(partXmlReader);
return xmlDoc;
}
static void Main(string[] args)
{
using (WordprocessingDocument doc =
WordprocessingDocument.Open("Test.docx", false))
{
XmlDocument xmlDoc = GetXmlDocument(doc.MainDocumentPart);
string wordNamespace =
"http://schemas.openxmlformats.org/wordprocessingml/2006/main";
XmlNamespaceManager nsmgr =
new XmlNamespaceManager(xmlDoc.NameTable);
nsmgr.AddNamespace("w", wordNamespace);
XmlElement bookmarkStart = (XmlElement)xmlDoc.SelectSingleNode("descendant::w:bookmarkStart[@w:id='0']", nsmgr);
XmlNodeList nodesFollowing = bookmarkStart.SelectNodes("following::*", nsmgr);
var nodesBetween = nodesFollowing
.Cast<XmlNode>()
.TakeWhile(n =>
{
if (n.Name != "w:bookmarkEnd")
return true;
if (n.Attributes.Cast<XmlAttribute>().Any(a => a.Name == "w:id" && a.Value == "0"))
return false;
return true;
});
foreach (XmlElement item in nodesBetween)
{
Console.WriteLine(item.Name);
if (item.Name == "w:bookmarkStart" || item.Name == "w:bookmarkEnd")
foreach (XmlAttribute att in item.Attributes)
Console.WriteLine("{0}:{1}", att.Name, att.Value);
}
}
}
}
I've put together an algorithm that can easily retrieve the text of a bookmark.
How to Retrieve the Text of a Bookmark from an OpenXML WordprocessingML Document
I've also written code to replace the text of a bookmark:
Replacing Text of a Bookmark in an OpenXML WordprocessingML Document
-Eric
精彩评论