开发者

Selecting a specific div from a extern webpage using CURL

开发者 https://www.devze.com 2022-12-25 14:15 出处:网络
Hi can anyone help me how to select a specific div from the content of a webpage. Let\'s say i want t开发者_JAVA技巧o get the div with id=\"wrapper_content\" from webpagehttp://www.test.com/page3.php

Hi can anyone help me how to select a specific div from the content of a webpage.

Let's say i want t开发者_JAVA技巧o get the div with id="wrapper_content" from webpage http://www.test.com/page3.php.

My current code looks something like this: (not working)

//REG EXP.
$s_searchFor = '@^/.dont know what to put here..@ui';    

//CURL
$ch = curl_init();
$timeout = 5; // set to zero for no timeout
curl_setopt ($ch, CURLOPT_URL, 'http://www.test.com/page3.php');
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
if(!preg_match($s_searchFor, $ch))
{
  $file_contents = curl_exec($ch);
}
curl_close($ch);

// display file
echo $file_contents;

So i'd like to know how i can use reg expressions to find a specific div and how to unset the rest of the webpage so that $file_content only contains the div.


HTML isn't regular, so you shouldn't use regex. Instead I would recommend a HTML Parser such as Simple HTML DOM or DOM

If you were going to use Simple HTML DOM you would do something like the following:

$html = str_get_html($file_contents);
$elem = $html->find('div[id=wrapper_content]', 0);

Even if you used regex your code still wouldn't work correctly. You need to get the contents of the page before you can use regex.

//wrong
if(!preg_match($s_searchFor, $ch)){
    $file_contents = curl_exec($ch);
}

//right
$file_contents = curl_exec($ch); //get the page contents
preg_match($s_searchFor, $file_contents, $matches); //match the element
$file_contents = $matches[0]; //set the file_contents var to the matched elements


include('simple_html_dom.php');
$html = str_get_html($file_contents);
$elem = $html->find('div[id=wrapper_content]', 0);

Download simple_html_dom.php


check our hpricot, it lets you elegantly select sections

first you would use curl to get the document, then use hpricot to get the part you need

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号