deleting unwanted xml nodes_问答_开发者_运维开发者技术经验分享

开发者 https://www.devze.com 2023-01-04 17:39 出处：网络

相关专题：r regex xml

I have a bunch of .xml files with nodes that are causing uncessesary complications. I w开发者_开发百科ould like to remove these nodes but ensure that thier children are preserved (not the heirarchical structure but the data). Eventually I want to take the data from each .xml and build a dataframe. It seems like xmlTreeParse along with xmlToList will help but the latter only works well with a flat structure. I have played around with unlisting the output from xmlToList and then converting it a dataframe but the output is a bit funky.

I thought about simply writing a function to go through all the files and delete all tags that I don't want however I don't know how to do this in R.

ANy suggestions?

It's simple to do in XSLT. Add this to the identity transform:

<xsl:template match="poop">
   <xsl:apply-templates select="node()"/>
</xsl:template>

Using regular expressions on XML hastens the coming of the Elder Gods and is not recommended.

see if this is what you are looking for, you can use XML package from CRAN for the parsing of XML documents. You can use the following tactic to get only the <poop> tags:

me<-xmlTreeParse(filename,useInternalNodes=T)
pooptags<-xpathApply(me,"//poop")

pooptags will contain the following information :

<poop>
  <P3a_Village1>dzemeni</P3a_Village1>
  <P4_HousholdNumber/>
  <P5_VisitNumber>2</P5_VisitNumber>
</poop>

you can paste this with the <?xml version='1.0' ?> using paste command in R and write it to a truncated file. or you can further extract information like P3a_Village1 from the XML file using the xpathApply like this:

village<-xpathApply(me,"//poop/P3a_Village1")

I hope the solution is what you are looking for. Please let me know if it helps.

deleting unwanted xml nodes

精彩评论

关注公众号

热门标签

图文推荐

deleting unwanted xml nodes

更多 问答 相关资讯：

精彩评论

关注公众号

热门标签

图文推荐

更多问答相关资讯：