Which HTML DOM parser works best on Android?_问答_开发者

Which HTML DOM parser works best on Android?

开发者 https://www.devze.com 2023-04-08 01:51 出处：网络

I need to process some HTML pages in my Android App and I would prefer to use XPath for extracting the relevant information. For regular J2SE there are a lot of possible implementations for parsing re

jTidy
TagSoup
Jericho
NekoHTML
HTMLCleaner

(List may be incomplete - it has been extracted from https://stackoverflow.com/questions/2009897/recommend-an-alternative-to-jtidy)

But it is very complicated to estimate if and how good those libraries work on Android (library size, cpu and memory consumption).

Based on your 开发者_JAVA技巧experience - what is the library of your choice for Android?

OK, looks like no-one can answer that question - then I have to check it myself.

jTidy

I downloaded the latest jTidy sources, compiled them and added the created jar file as library to my Android app. There were no problems using jTidy in my App (emulator and real phone). At runtime jTidy also works fine - but it seems that it is not a good fit for the limited Android environment - it works really slow. Looking at the Logcat output even parsing a ~10kb html file causes the garbage collector to work heavily.

HTMLCleaner

From my experience HTMLCleaner works also nice on Android; the library size is relatively small (106KB for v2.2). However the parsed DOM it creates is not as expected - HTMLCleaner inserts for example additional <span> elements into the DOM. This may be OK if you want to display it as an HTML file but for my use case - extrecting information via XPath expressions - this is a no-go!

TagSoup

Not tested

Jericho

Not tested

NekoHTML

Not tested

JSoup

Not tested