I have just the url of a post, like http://www.avc.co开发者_开发问答m/a_vc/2011/08/html5-continued.html , is ther any way of get the content of this post? I mean, exclude menus, logos and advertisements.
Thank you very much!
If you want to scrape the site, first consider whether this is legal.
Then, you can do that be getting the innerHTML
(or with jQuery - the .html()
) of the appropriate element. In your case this is disqus_post_message
As @bensiu noted it would be easier to use the RSS feed.
Since you tagged Java, here are the libraries that can be useful:
- HtmlParser for parsing the html
- Rome for RSS
精彩评论