i need to scrape (with approval) web sites before I start to write my own what is the best tool/way to scrape web sites, which is both fast (multithreaded) and easy to learn?
Take a look at this recent blog post by Lee Holmes. He wrote a pretty cool screen scraper using Powershell and the HTML Agility Pack.
Consider using TestPlan. It has a display-less browser mode for fast scraping. The scripting language is very simple and quick to learn the basics.
TagSoup, a SAX-compliant parser written in Java, parses HTML as it is found in the wild: poor, nasty and brutish, though quite often far from short.
Details here: http://mercury.ccil.org/~cowan/XML/tagsoup/
Have you taken a look at this - https://scraperwiki.com/
精彩评论