I have an XML like the following:
<documentation>
This value must be <i>bigger</i> than the other.
</documentation>
Using JDOM, I can get the following text structures:
Document d = new SAXBuilder().build( new StringReader( s ) );
System.out.printf( "getText: '%s'%n", d.getRootElement().getText() );
System.out.printf( "getTextNormalize: '%s'%n", d.getRootElement().getTextNormalize() );
System.out.printf( "getTextTrim: '%s'%n", d.getRootElement().getTextTrim() );
System.out.printf( "getValue: '%s'%n", d.getRootElement().getValue() );
which give me the following outputs:
getText: '
This value must be than the other.
'
getTextNormalize: 'This value must be than the other.'
getTextTrim: 'This valu开发者_如何学Ce must be than the other.'
getValue: '
This value must be bigger than the other.
'
What I really wanted was to get the content of the element as a string, namely, "This value must be <i>bigger</i> than the other."
. getValue()
comes close but removes the <i>
tag. I guess I wanted something like innerHTML
for XML documents...
Should I just use an XMLOutputter on the contents? Or is there a better alternative?
In JDOM pseudocode:
for Object o in d.getRootElement().getContents()
if o instanceOf Element
print <o.getName>o.getText</o.getName>
else // it's a text
print o.getText()
However, as Prashant Bhate wrote: content.getText() gives immediate text which is only useful fine with the leaf elements with text content.
Jericho HTML is great for this sort of task. You can accomplish exactly what you're trying to do with a code block like this:
String snippet = new Source(html).getFirstElement().getContent().toString();
It's also great for working with HTML in general because it doesn't try to force it into being XML...it deals with it much more leniently.
I'd say you should change your document to
<documentation>
<![CDATA[This value must be <i>bigger</i> than the other.]]>
</documentation>
in order to adhere to the XML specification. Otherwise <i>
would be considered a child element of <documentation>
and not content.
精彩评论