开发者

simplehtmldom class and image

开发者 https://www.devze.com 2023-01-13 06:53 出处:网络
I am using simplehtmldom class to get all images from a website, i am trying to get width and height of image returned by simplehtmldom,

I am using simplehtmldom class to get all images from a website,

i am trying to get width and height of image returned by simplehtmldom,

what i am trying to accomplish here is, if a image width les开发者_StackOverflow社区s than 50px, i don't want the image to be displayed.

I tried getimagesize(), however its often keep on timeout i think due to amount of images.

Any idea?

Thanks.


Using getimagesize() is very slow, especially if you're scraping a site and get many images. PHP has to download the entirety of each image BEFORE it can pass the data to getimagesize(), so if you're working on (for instance) a large photo gallery, you could be downloading many megabytes per image.

There's a few things you can do to speed up the process:

  1. check the height/width attributes of the <img> tag and only grab images where either's larger than 50. They might not necessarily be accurate, as the web page creator could be stretching or shrinking the image, but it would save you from downloading accurately sized small images.

  2. Instead of fetching the images directly with getimagesize() you could try to fetch only the first couple hundred bytes of each, which will contain the image header information. For GIF/JPEG images, the height/width will be very near the beginning on the file, so you'd save on file transfer overhead.

  3. Increase your script's execution time. Fetching all the images will naturally be a fairly slow process, and you'll most likely run up against PHP's max_execution_time

comment followup:

Well, if there's no height/width, then you can jump straight to fetching the image (or first bit of the image) and extracting height/width directly. Checking the height/width in the tag is just to save you the trouble of having to fetch the image in the first place.

As for extracting the height/width from the HTML, it's just a matter of using ->getAttribute('width') and ->getAttribute('height') calls once you've found an <img> tag with the SimpleHTMLDOM. Something like this:

$dom = file_get_html('http://example.com/somepage.html');
$images = $dom->find('img');

foreach($images as $img) {
    $h = $img->getAttribute('height');
    $w = $img->getAttribute('width');

    if (isnull($h) || (isnull($w)) {
       // height and/or width not available in tag, so fetch image and get size that way
       $h = ...
       $w = ...
    }

    if (($h >= 50) && ($w >= 50)) {
        // image is bigger than 50x50, so display it...
    }
}

This probably won't work if you cut/paste it, just doing off the top of my head, but it should be enough to get you started.


It is difficult to help you since you didn't post any source code that you are using.

You should know that the height and width attributes won't necessarily be in the HTML, therefore simplehtmldom won't be useful to you. You will need to use something else for this. You are on the right track with getimagesize(). This function could timeout if the host you are trying to reach isn't reachable. You need to appropriately handle this with set_time_limit(). You should also be catching when getimagesize() returns 0.

0

精彩评论

暂无评论...
验证码 换一张
取 消