I have a large XML file (~18MB). Apparently there is a tag somewhere in it that isn't closed. I know this because when I ran the W3C markup validation tool (validator.w3.org), I get the following error:
You may have neglected to close an element, or perhaps you meant to "self-close" an element, that is, ending it with "/>" instead of ">".
My question is how I m开发者_如何转开发ight go about finding this missing closed element among the 500,000 lines in the file. Is there a tool I could use that would suggest places where there might be a problem -- such as an element that has not been closed after a certain number of lines?
Any ideas would be much appreciated.
I use Notepad++ which has an excellent XML Tools plugin that lets you check XML Syntax and takes you to the line that is problematic. It also has useful utilities.
I just opened an XML file in VS 2010 (with ReSharper), broke the XML and what do you know? The error was highlighted immediately. If you have access to the same, it's that simple.
xmllint
is a standard tool for this. From the Validation & DTDs page:
The simplest way is to use the xmllint program included with libxml. The --valid option turns-on validation of the files given as input. For example the following validates a copy of the first revision of the XML 1.0 specification:
xmllint --valid --noout test/valid/REC-xml-19980210.xml
the -- noout is used to disable output of the resulting tree.
The --dtdvalid dtd allows validation of the document(s) against a given DTD.
Libxml2 exports an API to handle DTDs and validation, check the associated description.
If your document isn't "pretty-printed" it can still be hard to find the offending node, so you might want to use xmllint to rewrite the file to be indented.
Since you do not have an XML Schema, there is no fool-proof way of finding the offending code, for example XML allows for recursive structures. But you CAN write your own XML Schema, although that will potentially be a lot of stuff to learn. Alternatively, I would create a simple, stupid, validator of the node level and the element name, as so:
private void parseAndCheckStructure(XMLStreamReader reader) throws XMLStreamException {
// first read header, this is probably not the offending element (?)
int event = -1;
while (reader.hasNext()) {
event = reader.next();
if (event == XMLStreamConstants.START_ELEMENT){
break;
} else if (event == XMLStreamConstants.END_DOCUMENT) {
throw new XMLStreamException();
}
}
// read the rest of the document.
int level = 1;
do {
event = reader.next();
if (event == XMLStreamConstants.START_ELEMENT){
level++;
String localName = reader.getLocalName();
if(localName.equals("FirstElement")) {
parseFirstElementWithALoopLikeTheCurrent(reader);
level--;
} else if(localName.equals("SecondElement")) {
parseSecondElementWithALoopLikeTheCurrent(reader);
level--;
} else throw new RuntimeException("Unknown element " + localName + " at level " + level + " and location " + reader.getLocation());
} else if(event == XMLStreamConstants.END_ELEMENT) {
// keep track of level
level--;
}
} while(level > 0);
}
Alternatively, parse the whole document within the above do-while loop, and do checks like
if(level == 4 && localName.equals("MyElement")) {
// ok
} else {
// throw exception with the location
}
It sucks, but it works.
Try Opening the .xml file with chrome browser, It'll pin point the exact location of the fault.
精彩评论