How to retrieve an Element mixed children as text (JDOM)_问答_开发者

How to retrieve an Element mixed children as text (JDOM)

开发者 https://www.devze.com 2023-03-02 03:48 出处：网络

I have an XML like the following: <documentation> This value must be bigger than the other.

相关专题：jdom xml

I have an XML like the following:

<documentation>
    This value must be <i>bigger</i> than the other.
</documentation>

Using JDOM, I can get the following text structures:

Document d = new SAXBuilder().build( new StringReader( s ) );
System.out.printf( "getText:          '%s'%n", d.getRootElement().getText() );
System.out.printf( "getTextNormalize: '%s'%n", d.getRootElement().getTextNormalize() );
System.out.printf( "getTextTrim:      '%s'%n", d.getRootElement().getTextTrim() );
System.out.printf( "getValue:         '%s'%n", d.getRootElement().getValue() );

which give me the following outputs:

getText:          '
    This value must be  than the other.
'
getTextNormalize: 'This value must be than the other.'
getTextTrim:      'This valu开发者_如何学Ce must be  than the other.'
getValue:         '
    This value must be bigger than the other.
'

What I really wanted was to get the content of the element as a string, namely, "This value must be bigger than the other.". getValue() comes close but removes the  tag. I guess I wanted something like innerHTML for XML documents...

Should I just use an XMLOutputter on the contents? Or is there a better alternative?

In JDOM pseudocode:

for Object o in d.getRootElement().getContents()
   if o instanceOf Element
      print <o.getName>o.getText</o.getName>
   else // it's a text
      print o.getText()

However, as Prashant Bhate wrote: content.getText() gives immediate text which is only useful fine with the leaf elements with text content.

Jericho HTML is great for this sort of task. You can accomplish exactly what you're trying to do with a code block like this:

String snippet = new Source(html).getFirstElement().getContent().toString();

It's also great for working with HTML in general because it doesn't try to force it into being XML...it deals with it much more leniently.

I'd say you should change your document to

<documentation>
  <![CDATA[This value must be <i>bigger</i> than the other.]]>
</documentation>

in order to adhere to the XML specification. Otherwise  would be considered a child element of <documentation> and not content.

How to retrieve an Element mixed children as text (JDOM)

精彩评论

关注公众号

热门标签

图文推荐

How to retrieve an Element mixed children as text (JDOM)

更多 问答 相关资讯：

精彩评论

关注公众号

热门标签

图文推荐

更多问答相关资讯：