开发者

How can I create a basic human readable plain text representation of XHTML using Java?

开发者 https://www.devze.com 2023-02-13 04:27 出处:网络
Given some simple XHTML, I\'d like to create a human readable plain text version of it.This would involve removing all HTML tags, but adding or preserving some whitespace.

Given some simple XHTML, I'd like to create a human readable plain text version of it. This would involve removing all HTML tags, but adding or preserving some whitespace.

For example, this input:

<div>
<p>This is some text, some is <b>bold</b>.</p>
<ul>
  <li>Point one</li>
  <li>Point two</li>
</ul>
</div>

wou开发者_开发知识库ld become:

"This is some text, some is bold. Point one Point two"

(commas between the LIs would be ideal... :)


Jericho HTML Parser. You can either strip all the tags or call on a "renderer" class that tries to mimick the look (eg your bulleted lists would be tabbed)

0

精彩评论

暂无评论...
验证码 换一张
取 消