开发者

Create NodeList of all Document nodes manually

开发者 https://www.devze.com 2023-03-26 15:22 出处:网络
I currently generate a NodeList of all the Document nodes (in document order) manually. The XPath expression to get this NodeList is

I currently generate a NodeList of all the Document nodes (in document order) manually. The XPath expression to get this NodeList is

//. | //@* | //namespace::*

My first attempt for walking the DOM manually and collecting the nodes (NodeSet is a primitive NodeList implementation delegating to a List):

private static void walkRecursive(Node cur, NodeSet nodes) {
    nodes.add(cur);

    if (cur.hasAttributes()) {
        NamedNodeMap attrs = cur.getAttributes();
        for (int i=0; i < attrs.getLength(); i++) {
            Node child = attrs.item(i);
            walkRecursive(child, nodes);
        }
    }

    int type = cur.getNodeType();
    if (type == Node.ELEMENT_NODE || type == Node.DOCUMENT_NODE) {
        NodeList children = cur.getChildNodes();
        if (children == null)
            return;

        for (int i=0; i < children.getLength(); i++) {
            Node child = children.item(i);
            walkRecursive(child, list);
        }
    }
}

I would start the recursion with calling walkRecursive(doc, nodes) where doc is the org.w3c.Document and nodes a (yet empty) NodeSet.

I tested this using this primitive XML document:

<?xml version="1.0"?>
<myns:root xmlns:myns="http://www.my.ns/#">
  <myns:element/>
</myns:root>

If I for example canonicalize my manually created NodeSet and the NodeList generated by the initially mentioned XPath expression and compare the two byte for byte, then the result is equal and seems to work just fine.

But, if I iterate over the two NodeLists and print debug info (typeString simply generates a string representation)

for (int i=0; i < nodes.getLength(); i++) {
    Node child = nodes.item(i);
    System.out.println("Type: " + typeString(child.getNodeType()) +
                       " Name:" + child.getNodeName() + 
                       " Local name: " + child.getLocalName() +
                       " NS: " + child.getNamespaceURI());
}

then I receive this output for the XPath-generated NodeList:

Type: DocumentNode Name:#document Local name: null NS: null
Type: Element Name:myns:root Local name: root NS: http://www.my.ns/#
Type: Attribute Name:xmlns:myns Local name: myns NS: http://www.w3.org/2000/xmlns/
Type: Attribute Name:xmlns:xml Local name: xml NS: http://www.w3.org/2000/xmlns/
Type: Text Name:#text Local name: null NS: null
Type: Element Name:myns:element Local name: element NS: http://www.my.ns/#
Type: Text Name:#text Local name: null NS: null

and this for the manually generated NodeList:

Type: DocumentNode Name:#document Local name: null NS: null
Type: Element Name:myns:root Local name: root NS: http://www.my.ns/#
Type: Attribute Name:xmlns:myns Local name: myns NS: http://www.w3.org/2000/xmlns/
Type: Text Name:#text Local name: null NS: null
Type: Element Name:myns:element Local name: element NS: http://www.my.ns/#
Type: Text Name:#text Local name: null NS: null

So, as you can see, in the first example the NodeList additionally contains the Node for the XML namespace:

Type: Attribute Name:xmlns:xml Local name: xml NS: http://www.w3.org/2000/xmlns/

Now my questions:

a) If I interpret xml-names11 correctly, then I don't need the xmlns:xml declaration:

The prefix xml is by definition bound to the namespace name http://www.w3.org/XML/1998/namespace. It MAY, but need not, be declared, and MUST NOT be undeclared or bound to any other namespace name. Other prefixes MUST NOT be bound to this namespace name, and it MUST NOT be declared as the default namespace.

Am I correct? (at least c) hints in that direction)

b) But then, why does the XPath evaluation add it anyway - shouldn't it just include what was there in the first place instead of automagically adding things?

c) This can cause trouble with XML canonicalization, although it shouldn't - declarations of the xml namespace should be omitted during canonicalization. Does anyone know of (Java) implementations that get this wrong?


Edit:

Here's the code I used to evaluate the XPath expression that contained the 'xml' namespace node:

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setNamespaceAware(true);
dbf.setValidating(false);
InputStream in = ...;
try {
    Document doc = 开发者_StackOverflow中文版dbf.newDocumentBuilder().parse(in);
    XPathFactory fac = XPathFactory.newInstance();
    XPath xp = fac.newXPath();
    XPathExpression exp = xp.compile("//. | //@* | //namespace::*");
    NodeList nodes = (NodeList)exp.evaluate(doc, XPathConstants.NODESET);
} finally {
    in.close();
}


Since you can write

<myns:root xml:space="preserve" xmlns:myns="http://www.my.ns/#">
  <myns:element/>
</myns:root>

without declaring the "xml" prefix, then it must be there implicitly. It is therefore correct to include the namespace node for this namespace declaration in the //namespace:* location step

So,

a) you are wrong, you need it (well, depending on the purpose of your code)

b) see above

c) no, but I've seen other namespace corner cases where things went haywire (e.g. Problem with conversion of org.dom4j.Document to org.w3c.dom.Document and XML Signature

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号