Xpath to the tag inside CDATA_问答_开发者_运维开发者技术经验分享

开发者 https://www.devze.com 2023-03-27 04:12 出处：网络

I want to find the xpath to a tag whi开发者_JAVA技巧ch is inside a CDATA. Below the xml fragment.

相关专题：xml

I want to find the xpath to a tag whi开发者_JAVA技巧ch is inside a CDATA. Below the xml fragment.

<books>
 <book>
  <title></title>
  <content><![CDATA[<p>Hi hello Hw r u?</p><p>We are fine</p><p>Hi babeeee!!!!</p>]]>    </content>
 </book>
</books>

I want to get the data which is inside the first <p> tag inside <content>. Can anybody please give the correct xpath to it?

CDATA contains arbitrary character data. In contradiction to PCDATA (acronym of parsed character data) it is not parsed, so there is no xpath to "elements" inside of it.

As Leif said, the content in the CDATA section is not parsed, so it's just text, even though it looks like markup. You'd have to parse it. Which you could do using Saxon (9.1 or later commercial editions) and saxon:parse. You'd then find it's not well formed, so you'd probably have to resort to a parser such as TagSoup to parse it.

You could also treat it as a string:

<xsl:stylesheet version="1.0"
  xmlns:saxon="http://saxon.sf.net/"
  exclude-result-prefixes="saxon"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:template match="/">
    <Root>
      <!--xsl:value-of select="saxon:parse(/books/book/content)"/-->
      <xsl:for-each select="books/book/content">
        <xsl:value-of select="
          substring-before(
          substring-after( . , '&gt;' ), '&lt;' ) "/>
      </xsl:for-each>
    </Root>
  </xsl:template>
</xsl:stylesheet>