I have a big (~40mb) collection of XML data, split in many files which are not well formed, so i merge them, add a root node and load all the xml in a XmlDocument
. Its basically a list of 3 different types which can be nested in a few different ways. This example should show most of the cases:
<Root>
<A>
<A>
<A></A>
<A></A>
</A>
</A>
<A />
<B>
<A>
<A>
<A></A>
<A></A>
</A>
</A>
</B>
<C />
</Root>
Im separating all A, B and C nodes by using XPath expressions on a XmlDocument
(//A
, //B
, //C
), convert the resulting nodesets to a datatable and show a list of all nodes of each nodetype separately in a Datagridview. This works fine.
But now Im facing an even bigger file and as soon as i load it, it shows me only 4 rows. Then i added a breakpoint at the line where the actual XmlDocument.SelectNodes
happens and checked the resulting NodeSet
. It shows me about 25,000 entries. After continuing the program loaded and whoops, all my 25k rows were shown. I tried it again and i can reproduce it. If i step over XmlDocument.SelectNodes by hand, it works. If i dont break there, it does not. Im not spawning a single thread in my application.
How can i debug this any further? What to look for? I have experienced such behaviour with multithreaded libraries such as jsch (ssh) but im dont see why this should happen in my case.
Thank you very much!
// class XmlToDataTable:
private DataTable CreateTable(NamedXPath logType,
List<XmlColumn> columns,
ITableCreator tableCreator)
{
// I have to break here -->
XmlNodeList xmlNodeList = logFile.GetEntries(logType);
// <-- I have to break here
DataTable dataTable = tableCreator.CreateTableLayout(columns);
foreach (XmlNode xmlNode in xmlNodeList)
{
DataRow row = dataTable.NewRow();
tableCreator.PopulateRow(xmlNode, row, columns);
dataTable.Rows.Add(row);
}
return dataTable;
}
// class Log开发者_如何学Cfile:
public XmlNodeList GetEntries(NamedXPath e)
{
return (_xmlDocument != null && _xmlDocument.HasChildNodes)
? _xmlDocument.SelectNodes(e.XPath)
: new XmlNullObjectNodeList();
}
// _xmlDocument gets loaded here after reading all xml fragments into a string
// (ugly, i know. the // ugly! comment reminds me about that ;))
private void CreateXmlDoc()
{
_xmlDocument = new XmlDocument();
_xmlDocument.LoadXml(OPEN_ROOT_ELEMENT + _xmlString +
CLOSE_ROOT_ELEMENT);
if (DataChanged != null)
DataChanged(this, new EventArgs());
}
// class NamedXPath:
public abstract class NamedXPath
{
private readonly String _name;
private readonly String _xPath;
protected NamedXPath(string name, string xPath)
{
_name = name;
_xPath = xPath;
}
public string Name
{
get { return _name; }
}
public string XPath
{
get { return _xPath; }
}
}
Instead of using XPath directly in the code first, I would use a tool such as sketchPath to get my XPath right. You can either load your original XML or use subset of original XML.
Play with XPath and your XML to see if the expected nodes are getting selected before using xpath in your code.
Okay, solved it. tableCreator
is part of my strategy pattern, which influences the way the table is built. In a certain implementation I do something like this:
XmlNode xn = xmlDocument.SelectSingleNode(fancyXPath);
// if a node has ancestors, then its a linked list:
// <a><a><a></a></a></a>
if(xn.SelectSingleNode("a") != null)
xn.SelectSingleNode("a").InnerText = "<IDs of linked list items CSV like here>";
Which means im replacing parts of a xml linked list with some text and lose the nested items there.
Wouldn't be a problem to find this bug if this change wouldn't affect the original XmlDocument
. Even then, debugging it should not be too hard. What makes my program behaving differently depending whether I break or not seems to be the following:
Return Value: The first XmlNode that matches the XPath query or null if no matching node is found. The XmlNode should not be expected to be connected "live" to the XML document. That is, changes that appear in the XML document may not appear in the XmlNode, and vice versa. (API Description of XmlNode.SelectNodes())
If I break there, the changes are written back to the original XmlDocument, if I don't break, its not written back. Can't really explain that to myself, but without the change in the XmlNode everything works.
edit:
Now im quite sure: I had XmlNodeList.Count in my watches. This means, everytime i debugged, VS called the property Count
, which not only returns a number but calls ReadUntil(int), which refreshes the internal list:
internal int ReadUntil(int index)
{
int count = this.list.Count;
while (!this.done && (count <= index))
{
if (this.nodeIterator.MoveNext())
{
XmlNode item = this.GetNode(this.nodeIterator.Current);
if (item != null)
{
this.list.Add(item);
count++;
}
}
else
{
this.done = true;
return count;
}
}
return count;
}
This may have caused that weird behavior.
精彩评论