If I have a DOM, is it possible to get the reverse XPath of an element? For example, if I have :
<start>
<nodes>
<node>
<name>开发者_如何学编程Whatever</name>
</node>
<node>
<name>Whatever 2</name>
</node>
</nodes>
</start>
If for example, I have a reference to the node with the name Whatever 2
, is it possible to get back /start/nodes/node/name[. = "Whatever 2"]
?
Here's a very simple approach to walking up the tree using the Java DOM API in the Scala REPL:
First we import the relevant packages and set up our document builder and source:
scala> import org.w3c.dom._
import org.w3c.dom._
scala> import javax.xml.parsers._
import javax.xml.parsers._
scala> val factory = DocumentBuilderFactory.newInstance()
factory: javax.xml.parsers.DocumentBuilderFactory = ...
scala> val builder = factory.newDocumentBuilder()
builder: javax.xml.parsers.DocumentBuilder = ...
scala> val source = new org.xml.sax.InputSource()
source: org.xml.sax.InputSource = org.xml.sax.InputSource@7ecec7c6
Now to parse the example document:
scala> val content = """<start>
<nodes>
<node><name>Whatever</name></node>
<node><name>Whatever 2</name></node>
</nodes>
</start>"""
content: java.lang.String = ...
scala> source.setCharacterStream(new java.io.StringReader(content))
scala> val document = builder.parse(source)
document: org.w3c.dom.Document = [#document: null]
This is a very simple function that recursively walks up the DOM to the document root:
scala> def path: Node => String = {
| case document: Document => ""
| case node => path(node.getParentNode) + "/" + node.getNodeName
| }
path: org.w3c.dom.Node => String
And we pick the second <name>
node to test:
scala> val node = document.getElementsByTagName("name").item(1)
node: org.w3c.dom.Node = [name: null]
We get what we expect:
scala> path(node)
res1: String = /start/nodes/node/name
It wouldn't be hard to tweak the path
function to avoid explicit recursion or to gather more information as it walks up the tree—for example indicating position when necessary to avoid ambiguity:
scala> def path(element: Element) = {
| def sameName(f: Node => Node)(n: Node) =
| Stream.iterate(n)(f).tail.takeWhile(_ != null).filter(
| _.getNodeName == n.getNodeName
| ).toList
| val preceding = sameName(_.getPreviousSibling) _
| val following = sameName(_.getNextSibling) _
| "/" + Stream.iterate[Node](element)(_.getParentNode).map {
| case _: Document => None
| case e: Element => Some { (preceding(e), following(e)) match {
| case (Nil, Nil) => e.getTagName
| case (els, _) => e.getTagName + "[" + (els.size + 1) + "]"
| }}
| }.takeWhile(_.isDefined).map(_.get).reverse.mkString("/")
| }
path: (element: org.w3c.dom.Element)java.lang.String
Note that I've changed the type slightly to make it clear that this will only give us a valid XPath path for elements. We can test:
scala> path(node.asInstanceOf[Element])
res13: java.lang.String = /start/nodes/node[2]/name
This is again what we expect.
As others have pointed out, if all you have is a scala.xml.Node
, you're not going to achieve your goal without spending a ridiculous amount of time and space.
However, if you're willing to make your callers jump through a few hoops, and you find the idea of dropping down to Java distasteful, you could do worse than to try a zipper.
Also see Daniel Spiewak's implementation in Anti-XML (likely to replace Scala's built-in XML support someday)
Sounds like you are looking for a function like path(Node):XPath? Unfortunately this is not possible to do efficiently with scala.xml as nodes have no parent reference. Options are: 1) Search the tree and properly id when you have found the right node. 2) Use another XML lib (scala or java) that supports parent refs...anti-xml, etc
精彩评论