开发者

[JSOUP]why 1.6.x remove TD tags,Problems upgrading to 1.6.x

开发者 https://www.devze.com 2023-03-30 05:40 出处:网络
System.out.println(Jsoup.parseBodyFragment(\"<td>123</td>\").html()); jsoup 1.5.2 OUTPUT: <html>
System.out.println(Jsoup.parseBodyFragment("<td>123</td>").html());

jsoup 1.5.2 OUTPUT:

<html>
 <head></head>
 <body>
  <table>
   <tbody>
    <tr>
     <td>123</td>
    </tr>
   </tbody>
  </table>
 </body>
</html>

jsoup 1.6.x (1.6.0 and 1.6.1) OUTPUT:

<html>
 <head></head>
 <body>
  123
 &l开发者_如何学运维t;/body>
</html>

why 1.6.x remove TD tags?

how can I get jsoup 1.5.x OUTPUT in 1.6.x?


In jsoup 1.6 I have rewritten the HTML parser to implement the whatwg HTML spec, which matches how browsers currently parse HTML.

The impact here is that in 1.5, a <td> was enough to auto-vivify a <table>; however browsers don't actually work that way, so in 1.6 you'll need to update your HTML input to introduce the <table> tag.

For example:

System.out.println(
  Jsoup.parseBodyFragment("<table><td>123</td></table>").html());

will produce:

<html>
 <head></head>
 <body>
  <table>
   <tbody>
    <tr>
     <td>123</td>
    </tr>
   </tbody>
  </table>
 </body>
</html>

Note that the <table><td> gets normalised to <table><tbody><tr><td>....

Hope this helps!

0

精彩评论

暂无评论...
验证码 换一张
取 消