开发者

How to get just the content of a post from a blog?

开发者 https://www.devze.com 2023-03-27 06:23 出处:网络
I have just the url of a post, like http://www.avc.co开发者_开发问答m/a_vc/2011/08/html5-continued.html ,is ther any way of get the content of this post? I mean, exclude menus, logos and advertisement

I have just the url of a post, like http://www.avc.co开发者_开发问答m/a_vc/2011/08/html5-continued.html , is ther any way of get the content of this post? I mean, exclude menus, logos and advertisements.

Thank you very much!


If you want to scrape the site, first consider whether this is legal.

Then, you can do that be getting the innerHTML (or with jQuery - the .html()) of the appropriate element. In your case this is disqus_post_message

As @bensiu noted it would be easier to use the RSS feed.

Since you tagged Java, here are the libraries that can be useful:

  • HtmlParser for parsing the html
  • Rome for RSS
0

精彩评论

暂无评论...
验证码 换一张
取 消