How can I fetch all images src开发者_如何学JAVA into array with file_get_content()
, with preg_match
or whatever?
You shouldn't use regex to parse HTML. You should use classes like DOMDocument to do so. DOMDocument has the getElementsByTagName method that can be used to retrieve all the img tag from the document you want to parse.
Here's an example that will echo the list of the images in the document :
<?php
$document = new DOMDocument();
$document->loadHTML(file_get_contents('yourfilehere.html'));
$lst = $document->getElementsByTagName('img');
for ($i=0; $i<$lst->length; $i++) {
$image = $lst->item($i);
echo $image->attributes->getNamedItem('src')->value, '<br />';
}
?>
It's more reliable and simpler to use phpQuery or SimpleHTMLparser (more elaborate). But for basic extraction purposes, and just searching for src= attributes, this is overkill and an regular expression is in fact sufficient:
preg_match_all('/<img[^>]+src\s*=[\'\"\s]?([^<\'\"]+)/ims', file_get_contents($url), $uu);
Note that it will yield relative path names, mostly not URLs. So needs postprocessing, whereas phpQuery IIRC has a shortcut for normalizing them.
精彩评论