开发者

Jsoup to extract data from html table

开发者 https://www.devze.com 2023-03-18 23:52 出处:网络
I\'ve started using JSoup today to use for an android app so I have this table which I need to extract data from, but from it seems, it\'s going to be tough. Need some help; the html for the table is

I've started using JSoup today to use for an android app so I have this table which I need to extract data from, but from it seems, it's going to be tough. Need some help; the html for the table is as below:

<TR BGCOLOR='#999999'>
      <TD ALIGN='left'><span class='S09W80'><font color=#DDDDDD>CODE</span></TD>
      <TD ALIGN='left'><span class='S09W80'><font color=#DDDDDD>SUBJECT NAME</span></TD>
      <TD ALIGN='right'><span class='S09W80'><font color=#DDDDDD>PERIOD FROM</span></TD>
      <TD ALIGN='right'><span class='S09W80'><font color=#DDDDDD>PERIOD TO</span></TD>
      <TD ALIGN='right'><span class='S09W80'><font color=#DDDDDD>ENROL DATE</span></TD>
      <T开发者_如何学GoD ALIGN='right'><span class='S09W80'><font color=#DDDDDD>GRADE</span></TD>                
</TR>

followed by repetitions of

<TR BGCOLOR='#FFFFFF'>
  <TD ALIGN='left'><span class='S09W50'>IT142</span></TD>
  <TD ALIGN='left'><span class='S09W50'>INTRODUCTION TO GRAPHICS DEVELOPMENT</span></TD>
  <TD ALIGN='right'><span class='S09W50'>21-FEB-11</span></TD>
  <TD ALIGN='right'><span class='S09W50'>17-JUN-11</span></TD>
  <TD ALIGN='right'><span class='S09W50'>22-FEB-11</span></TD>
  <TD ALIGN='center'><span class='S09W80'>B-</span></TD>
</TR>

but how do I use the doc.select (what selector to use?); here ?


Not really an Android question, but a CSS selector question. You can read more about it at http://www.w3.org/TR/CSS2/selector.html

Doing screen scraping like this is always tricky and there is no "right" solution.

You will need to perform multiple select steps.

  1. A selector like "body > table > tr". Take the first element. This will give you the initial TR element.
  2. Validate the TR element, get its child elements and validate one of them has the text "SUBJECT NAME".
  3. Then the other TR elements can be processed in order.
0

精彩评论

暂无评论...
验证码 换一张
取 消