开发者

PHP DOM Parser only working for some pages

开发者 https://www.devze.com 2023-02-13 06:56 出处:网络
I\'m using: http://simplehtmldom.sourceforge.net/ and noticed that in the examples, and trying to scrape certain sites, only some of them return results.

I'm using: http://simplehtmldom.sourceforge.net/ and noticed that in the examples, and trying to scrape certain sites, only some of them return results.

I'm using:

include_once('../../simple_html_dom.php');

// Create DOM from URL or file
$website = 'http://www.digg.com/';
$html = file_get_html($website);

// Find all images 
foreach($html->find('img') as $element) 
   echo "<img src=\"" . $website . $element->src . "\"" . '<br>';

Which shows a bunch of thumbnails, but they are pre开发者_JAVA百科tty much blank (and it's not returning all thumbnails).

Is it because they have some sort of htaccess restrictions on people? This happens for multiple websites.


You're assuming that $element->src is always relative to $website which it could easily not be...

For example: $element->src could already be http://www.digg.com/image.jpg so then doing $website . $element->src would be http://www.digg.com/http://www.digg.com/image.jpg and that wouldn't work...

Try

include_once('../../simple_html_dom.php');

// Create DOM from URL or file
$website = 'http://www.digg.com/';
$html = file_get_html($website);

// Find all images 
foreach($html->find('img') as $element) {
   //dont want double slashes
   $src = ltrim($element->src, '/');
   //dont want double urls
   $src = str_replace($website, "", $src);

   echo "<img src=\"" . $website . $src . "\"" . '<br>';
}
0

精彩评论

暂无评论...
验证码 换一张
取 消