I currently generate a NodeList
of all the Document nodes (in document order) manually. The XPath expression to get this NodeList
is
//. | //@* | //namespace::*
My first attempt for walking the DOM manually and collecting the nodes (NodeSet
is a primitive NodeList
implementation delegating to a List
):
private static void walkRecursive(Node cur, NodeSet nodes) {
nodes.add(cur);
if (cur.hasAttributes()) {
NamedNodeMap attrs = cur.getAttributes();
for (int i=0; i < attrs.getLength(); i++) {
Node child = attrs.item(i);
walkRecursive(child, nodes);
}
}
int type = cur.getNodeType();
if (type == Node.ELEMENT_NODE || type == Node.DOCUMENT_NODE) {
NodeList children = cur.getChildNodes();
if (children == null)
return;
for (int i=0; i < children.getLength(); i++) {
Node child = children.item(i);
walkRecursive(child, list);
}
}
}
I would start the recursion with calling walkRecursive(doc, nodes)
where doc
is the org.w3c.Document
and nodes
a (yet empty) NodeSet
.
I tested this using this primitive XML document:
<?xml version="1.0"?>
<myns:root xmlns:myns="http://www.my.ns/#">
<myns:element/>
</myns:root>
If I for example canonicalize my manually created NodeSet
and the NodeList
generated by the initially mentioned XPath expression and compare the two byte for byte, then the result is equal and seems to work just fine.
But, if I iterate over the two NodeList
s and print debug info (typeString
simply generates a string representation)
for (int i=0; i < nodes.getLength(); i++) {
Node child = nodes.item(i);
System.out.println("Type: " + typeString(child.getNodeType()) +
" Name:" + child.getNodeName() +
" Local name: " + child.getLocalName() +
" NS: " + child.getNamespaceURI());
}
then I receive this output for the XPath-generated NodeList
:
Type: DocumentNode Name:#document Local name: null NS: null
Type: Element Name:myns:root Local name: root NS: http://www.my.ns/#
Type: Attribute Name:xmlns:myns Local name: myns NS: http://www.w3.org/2000/xmlns/
Type: Attribute Name:xmlns:xml Local name: xml NS: http://www.w3.org/2000/xmlns/
Type: Text Name:#text Local name: null NS: null
Type: Element Name:myns:element Local name: element NS: http://www.my.ns/#
Type: Text Name:#text Local name: null NS: null
and this for the manually generated NodeList
:
Type: DocumentNode Name:#document Local name: null NS: null
Type: Element Name:myns:root Local name: root NS: http://www.my.ns/#
Type: Attribute Name:xmlns:myns Local name: myns NS: http://www.w3.org/2000/xmlns/
Type: Text Name:#text Local name: null NS: null
Type: Element Name:myns:element Local name: element NS: http://www.my.ns/#
Type: Text Name:#text Local name: null NS: null
So, as you can see, in the first example the NodeList additionally contains the Node
for the XML namespace:
Type: Attribute Name:xmlns:xml Local name: xml NS: http://www.w3.org/2000/xmlns/
Now my questions:
a) If I interpret xml-names11 correctly, then I don't need the xmlns:xml declaration:
The prefix xml is by definition bound to the namespace name http://www.w3.org/XML/1998/namespace. It MAY, but need not, be declared, and MUST NOT be undeclared or bound to any other namespace name. Other prefixes MUST NOT be bound to this namespace name, and it MUST NOT be declared as the default namespace.
Am I correct? (at least c) hints in that direction)
b) But then, why does the XPath evaluation add it anyway - shouldn't it just include what was there in the first place instead of automagically adding things?
c) This can cause trouble with XML canonicalization, although it shouldn't - declarations of the xml
namespace should be omitted during canonicalization. Does anyone know of (Java) implementations that get this wrong?
Edit:
Here's the code I used to evaluate the XPath expression that contained the 'xml' namespace node:
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setNamespaceAware(true);
dbf.setValidating(false);
InputStream in = ...;
try {
Document doc = 开发者_StackOverflow中文版dbf.newDocumentBuilder().parse(in);
XPathFactory fac = XPathFactory.newInstance();
XPath xp = fac.newXPath();
XPathExpression exp = xp.compile("//. | //@* | //namespace::*");
NodeList nodes = (NodeList)exp.evaluate(doc, XPathConstants.NODESET);
} finally {
in.close();
}
Since you can write
<myns:root xml:space="preserve" xmlns:myns="http://www.my.ns/#">
<myns:element/>
</myns:root>
without declaring the "xml" prefix, then it must be there implicitly. It is therefore correct to include the namespace node for this namespace declaration in the //namespace:*
location step
So,
a) you are wrong, you need it (well, depending on the purpose of your code)
b) see above
c) no, but I've seen other namespace corner cases where things went haywire (e.g. Problem with conversion of org.dom4j.Document to org.w3c.dom.Document and XML Signature
精彩评论