I'm using: http://simplehtmldom.sourceforge.net/ and noticed that in the examples, and trying to scrape certain sites, only some of them return results.
I'm using:
include_once('../../simple_html_dom.php');
// Create DOM from URL or file
$website = 'http://www.digg.com/';
$html = file_get_html($website);
// Find all images
foreach($html->find('img') as $element)
echo "<img src=\"" . $website . $element->src . "\"" . '<br>';
Which shows a bunch of thumbnails, but they are pre开发者_JAVA百科tty much blank (and it's not returning all thumbnails).
Is it because they have some sort of htaccess restrictions on people? This happens for multiple websites.
You're assuming that $element->src is always relative to $website which it could easily not be...
For example: $element->src could already be http://www.digg.com/image.jpg so then doing $website . $element->src would be http://www.digg.com/http://www.digg.com/image.jpg and that wouldn't work...
Try
include_once('../../simple_html_dom.php');
// Create DOM from URL or file
$website = 'http://www.digg.com/';
$html = file_get_html($website);
// Find all images
foreach($html->find('img') as $element) {
//dont want double slashes
$src = ltrim($element->src, '/');
//dont want double urls
$src = str_replace($website, "", $src);
echo "<img src=\"" . $website . $src . "\"" . '<br>';
}
精彩评论