I want to open a URL and RegEx all the image's URLs from the page. Then I want to cURL all of them and check what size they have. In the end I 开发者_运维知识库want to get the biggest one. How do I do this?
You could start with getting the URL using curl, saving it in a variable.
Then you could apply a regex like this one: <img.*?src=['"](.*?)['"]>
Check if the source starts with http or is a relative link, if its a relative link you can prepend the url of the page.
Finally get the size of the images using getimagesize() http://php.net/manual/en/function.getimagesize.php
Use the php DOM to find the images.
I have not tested this code at all, but it should get you headed in the right direction.
$urls = array();
$dom = DOMDocument::loadHTML(YOUR_HTML);
$imgList = $dom->getElementsByTagName('img');
$imgCount = $imgList->length;
for ($i = 0; $i < $imgCount; $i++) {
$imgElement = $imgList->item($i);
if ($imgElement->hasAttribute('src')) {
$urls[] = $imgElement->getAttribute('src');
}
}
If you want to get linked images, you can change 'img'/'src' to 'a'/'href'. But you will need to find a way to filter the list to get only images.
You did not say what your criteria is for image size, so I can't help you there. Do you want maximum file size or resolution?
It might be already obvious by now, use a DOM parser, not regex. Just get all elements by tag name <img>
and then get for each the URL from its src
attribute. To determine the image's size without downloading the entire image, you'd probably like to fire a HTTP HEAD
request instead and then determine the Content-Length
header in the obtained response. The http_head()
may be useful in this.
精彩评论