I开发者_如何学Go have an input String
containing some HTML fragment like the following example
I would have enever thought that <b>those infamous tags</b>,
born in the <abbr title="Don't like that acronym">SGML</abbr> realm,
would make their way into the web of objects that we now experience.
Obviously, real one is by far more complex (including links, iamges, divs, and so on), and I would like to write a method having the following prototype
String toXHTML(String html) {
// What do I have to write here ?
}
Without a description of the input format, it will probably be some html-like stuff. Parsing such a mess gets ugly quickly. But it looks like someone else did a good job already:
#!/usr/bin/env groovy
@Grapes(
@Grab(group='jtidy', module='jtidy', version='4aug2000r7-dev')
)
import org.w3c.tidy.*
def tidy = new Tidy()
tidy.parse(System.in, System.out)
Use the force, Riduidel.
Check out this: http://blog.foosion.org/2008/06/09/parse-html-the-groovy-way/ It might be something you are looking for.
精彩评论