I want to remove all html 开发者_如何学编程tag from the text. But I don't want to parse whole document using any dom library because creating a dom tree will be overhead on the performance, as I don't care about the structure.
Is there any fast and efficient way to convert html to plain text ?
If you don't need an in-memory DOM tree, use a parser with a SAX interface. Mind that some real-world HTML might need fault-tolerant parsing, though.
精彩评论