Would someone be able to help with how I could implement this, or at least the algorithm to use for this.
What I am trying to do is parse a a hierarchical/tree structure file into a relation store. I will explain further below, with an example.
This is a sample source file, just a simple/non-realistic example for purposes of this question.
<title text=“title1">
<comment id=“comment1">
<data> this is part of comment one</data>
<data> this is some more of comment one</data>
</comment>
<comment id=“comment2”>
<data> this is part of comment two</data>
<data> this is some more of comment two</data>
<data> this is even some more of comment two</data>
</comment>
</title>
So the main thing to note here is that the number of <comment>
, and the number of <data>
elements for each comment may be arbitrary. So given the above, I would want to transform into something looking like:
title | comment | data
------------------------------------------------------------------------
title1 comment1 this is some part of comment one
title1 comment1 this is some more of comment one
title1 comment2 this is part of comment two
title1 comment2 this is some more of comment two
title1 comment2 this is even some more of comment two
In order to make this happen, lets say I can have specified the relational schema in the following manner, using an xpath expression that can be evaluated on the source file.
attribute1: title = /title/@title
attribute2: comment = /tit开发者_StackOverflow中文版le/comment/@id
attribute3: data = /title/comment/data/text()
Suggested Data-structures:
- ResultSet is a
List<Map<String,String>>
(where: each map represents a single row) - Schema is a
Map<String,String>
(where: we map attribute-name --> path expression) - Source file, some
DOM Document
I'm not sure whether you're asking how to implement the XML parser itself or how, given a parse tree for the XML, how to flatten it into a hierarchical structure. I'm guessing that you're looking at the latter of these now (there are many good XML parsers out there and I doubt that's the bottleneck), so I'll answer that here. Let me know if you're actually interested in the XML parsing detail and I can update the answer.
I believe that the way you want to think about this is with a recursive descent over the tree. The idea is as follows: your naming system consists of the concatenation of all the nodes above you in the tree followed by your own name. Given that, you could run a recursive DFS over the tree using something like this:
FlattenXML(XMLDocument x) {
for each top-level XML node t:
RecFlattenTree(t, "");
}
RecFlattenTree(Tree t, String prefix) {
if t is a leaf with data d:
update the master table by adding (prefix, d) to the list of entries
else
for each child c of t, whose name is x:
RecFlattenTree(c, prefix + "/" + x)
}
For example, if you were to trace this over the XML document you had up top, it might go something like this:
RecFlattenTree(title1, "/title1")
RecFlattenTree(comment1, "/title1/comment1")
RecFlattenTree(data node 1 , "/title1/comment1")
Add /title1/comment1/data, value = "this is some part of comment one"
RecFlattenTree(data node 2, "/title1/comment1")
Add /title1/comment2/data, value = "this is some more of comment one"
RecFlattenTree(comment2, "/title1/comment2")
RecFlattenTree(data node 1 , "/title1/comment2")
Add /title1/comment2/data, value = "this is part of comment two"
RecFlattenTree(data node 2, "/title1/comment2")
Add /title1/comment2/data, value = "this is more of comment two"
RecFlattenTree(data node 3, "/title1/comment2")
Add /title1/comment2/data, value = "this is even more of comment two"
Which ends up generating the list
/title1/comment1/data, value = "this is some part of comment one"
/title1/comment1/data, value = "this is some more of comment one"
/title1/comment1/data, value = "this is part of comment two"
/title1/comment1/data, value = "this is more of comment two"
/title1/comment1/data, value = "this is even more of comment two"
Which is exactly what you want.
Hope this helps! Let me know if I misinterpreted your question!
精彩评论