开发者

TagSoup vs. Jsoup vs. HTML Parser vs. HotSax vs [closed]

开发者 https://www.devze.com 2023-02-15 02:58 出处:网络
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references,or expertise, but this question will likely solicit debate, a
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance. Closed 9 years ago.

The abundance of HTML parsers to choose from (and stick with) is mind boggling:

http://java-source.net/open-source/html-parsers

How do I choose one that best suits the following requirements:

  1. Mature (fewer bugs than the rest)
  2. Live and breathing (i.开发者_JAVA百科e. being maintained)
  3. Fast and resource-efficient (intended to run on Android)

Based on your experience, which HTML parser would you recommend (for meeting the above requirements) and why?


Well, I found the answer, which was given by @BalusC on a different thread:

  1. If you just want to use a XML based tool to traverse it: JTidy.
  2. If you like to unit test the HTML: HtmlUnit
  3. If you like to extract specific data from the HTML: Jsoup

Thank you @BalusC.

0

精彩评论

暂无评论...
验证码 换一张
取 消