Parse HTML in Android_问答_开发者_运维开发者技术经验分享

开发者 https://www.devze.com 2023-01-14 09:39 出处：网络

I am attempting to parse HTML for specific data but am having issues with return characters, at least I think that\'s what the problem is. I am using a simple substring method to take apart the HTML a

相关专题：android

I am attempting to parse HTML for specific data but am having issues with return characters, at least I think that's what the problem is. I am using a simple substring method to take apart the HTML as I know beforehand what I am looking for.

Here is my parse metho开发者_运维知识库d:

public static void parse(String response, String[] hashItem, String[][] startEnd) throws Exception
{

    for (i = 0; i < hashItem.length; i++)
    {
        part = response.substring(response.indexOf(startEnd[i][0]) + startEnd[i][0].length());
        value = part.substring(0, part.indexOf(startEnd[i][1]));
        DATABASE.setHash(hashItem[i], value);
    }
}

Here is a sample of the HTML that is giving me issues

<table cellspacing=0 cellpadding=2 class=smallfont>
<tr onclick="lu();" onmouseover="style.cursor='hand'">
<td class=bodybox nowrap>&nbsp;     21,773,177,147 $&nbsp;</td><td></td>
<td class=bodybox nowrap>&nbsp;        629,991,926 F&nbsp;</td><td></td>
<td class=bodybox nowrap>&nbsp;             24,537 P&nbsp;</td><td></td>
<td class=bodybox nowrap>&nbsp;                  0 T&nbsp;</td>
<td></td><td class=bodybox nowrap>&nbsp;RT&nbsp;</td>

There are hidden return characters but when I try to add them into the string that I am trying to use it doesn't work out well, if at all. Is there a method or perhaps a better way to strip hidden characters from the HTML to make it easier to parse? Any help is greatly appreciated as always.

If you want to make parsing very easy, try Jsoup:

This example will download the page, parse and get the text.

Document doc = Jsoup.connect("http://jsoup.org").get();

Elements tds = doc.select("td.bodybox");

for (Element td : tds) {
  String tdText = td.text();
}

You can try with XMLPullParser available in Android. You can use StringBuffer to append characters in between tags.

Try using a regex to gain the information you want: http://java.sun.com/developer/technicalArticles/releases/1.4regex/

You could even use it to remove the hidden characters. Or maybe use String.Replace to remove the newline characters?

You can parse the HTML file using a XMLReader for example as far as i know, check this article http://www.ibm.com/developerworks/xml/library/x-andbene1/