开发者

Easiest way to scrape Google for URLs via my browser?

开发者 https://www.devze.com 2023-02-06 01:29 出处:网络
I\'d like to scrape all the URLs my searches return when searching for stuff via Google. I\'ve tried making a script, but Google did not like it, and adding cookie support and captcha was too tedious.

I'd like to scrape all the URLs my searches return when searching for stuff via Google. I've tried making a script, but Google did not like it, and adding cookie support and captcha was too tedious. I'm looking for something that - when I'm browsing through the Google search pages - will simply take all the URLs on the pages and put them inside a .txt file or store them somehow开发者_JAVA技巧. Does any of you know of something that will do that? Perhaps a greasemonkey script or a firefox addon? Would be greatly appreciated. Thanks!


See the JSON/Atom Custom Search API.


I've done something similar for Google Scholar where there's no API available. My approach was basically to create a proxy web server (a java web app on Tomcat) that would fetch the page, do something with it and then show to user. This is 100% functional solution but requires quite some coding. If you are interested I can get into more details and put up some code.


Google search results are very easy to scrape. Here is an example in php.

<?
# a trivial example of how to scrape google
$html = file_get_contents("http://www.google.com/search?q=pokemon");

$dom = new DOMDocument();
@$dom->loadHTML($html);
$x = new DOMXPath($dom);
foreach($x->query("//div[@id='ires']//h3//a") as $node)
{
    echo $node->getAttribute("href")."\n";
}
?>


You may try IRobotSoft bookmark addon at http://irobotsoft.com/bookmark/index.html

0

精彩评论

暂无评论...
验证码 换一张
取 消