开发者

php: Fetch google first result

开发者 https://www.devze.com 2023-03-17 07:01 出处:网络
I had this code that help me fetch the URL of an actor开发者_如何学JAVA page on IMDB by searching \"IMDB+Actor name\" and givng me the URL to his IMDB profile page.

I had this code that help me fetch the URL of an actor开发者_如何学JAVA page on IMDB by searching "IMDB+Actor name" and givng me the URL to his IMDB profile page.

It worked fine till 5 minutes ago and all of a sudden it stopped working. Do we have a daily limit for google queries (would find it very strange!) or did I alter something on my code without noticing (in this case can you spot what's wrong?) ?

function getIMDbUrlFromGoogle($title){
    $url = "http://www.google.com/search?q=imdb+" . rawurlencode($title);
    echo $url;
    $html = $this->geturl($url);
    $urls = $this->match_all('/<a href="(http:\/\/www.imdb.com\/name\/nm.*?)".*?>.*?<\/a>/ms', $html, 1);

    if (!isset($urls[0]))
        return NULL;
    else
        return $urls[0]; //return first IMDb result

}

function geturl($url)
{
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 5);
    curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 5.1; rv:2.0.1) Gecko/20100101 Firefox/4.0.1");
    $html = curl_exec($ch);
    curl_close($ch);
    return $html;
}

function match_all($regex, $str, $i = 0)
{
    if(preg_match_all($regex, $str, $matches) === false)
        return false;
    else
        return $matches[$i];
}


They will, in fact, throttle you if you make queries too fast, or make too many. For example, their SOAP API limits you to 1k queries a day. Either throw in a wait, or use something that invites this kind of use... such as Yahoo's BOSS. http://developer.yahoo.com/search/boss/

ETA: I really, really, like BOSS, and I'm a Google fangirl. It gives you a lot of resources and clean data and flexibility... Google never gave us anything like this, which is too bad.


There is an API for the search for Google and it is limited to 100 queries/day! And it is not allowed to fetch Google search results with any kind of automatic tool, according to the G guidelines.


Google's webpage is designed for use by humans; they will shut you out if they notice you heavily using it in an automated way. Their Terms of Service are clear that what you are doing is not allowed. (Though they no longer seem to link directly to that from the search results page, much less their front page, and in any case AIUI at least some courts have upheld that putting a link on a page isn't legally binding.)

They want you to use their API, and if you use it heavily, to pay (they aren't exorbitant).

That said, why aren't you going directly to IMDb?

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号