开发者

Selecting specific XML nodes in R?

开发者 https://www.devze.com 2023-02-22 17:12 出处:网络
I am using XML package in R to parse a XML file that has the following structure. <document id=\"Something\" 开发者_开发知识库origId=\"Text\">

I am using XML package in R to parse a XML file that has the following structure.

 <document id="Something" 开发者_开发知识库origId="Text">
    <sentence id="Something" origId="thisorig" text="Blah Blah.">
    <special id="id.s0.i0" origId="1" e1="en1" e2="en2" type="" directed="True"/>
    </sentence>
     <sentence id="Something" origId="thisorig" text="Blah Blah.">
      </sentence>
</document>

I want to select the nodes having </special> tag in them in one variable and the nodes without the </special> tag in other variable.

Is it possible to do it with R any pointers/answers will be very helpful.


I added a few more cases to test for exceptions:

<document id="Something" origId="Text">
    <sentence id="Something" origId="thisorig" text="Blah Blah.">
    <special id="id.s0.i0" origId="1" e1="en1" e2="en2" type="" directed="True"/>
    </sentence>
    <sentence id="Else" origId="thatorig" text="Blu Blu.">
      <special id="id.s0.i1" origId="1" e1="en1" e2="en2" type="" directed="True"/>
    </sentence>
     <sentence id="Something" origId="thisorig" text="Blah Blah.">
       <notso id = "hallo" />
      </sentence>
     <sentence id="Something no sentence" origId="thisOther" text="Blah Blah.">
      </sentence>
</document>

library(XML)
doc = xmlInternalTreeParse("sentence.xml")
hasSentence = xpathApply(doc, "//sentence/special/..")
xpathApply(doc, "/document/sentence[not(child::special)]")


Parse the xml tree, use xpath to specify the location of the nodes.

doc <- xmlTreeParse("test.xml", useInternalNodes = TRUE)
special_nodes <- getNodeSet(doc, "/document//special")
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号