开发者

Regex to extract images from HTML - how to get only JPGs?

开发者 https://www.devze.com 2023-03-14 01:50 出处:网络
I am using this PHP function to grab all <img> tags within any given HTML. function extract_images($content)

I am using this PHP function to grab all <img> tags within any given HTML.

function extract_images($content)
{
    $img    = strip_tags(html_entity_decode($content),'<img>');
    $regex  = '~src="[^"]*"~';    
开发者_运维知识库
    preg_match_all($regex, $img, $all_images);

    return $all_images;
}

This works and returns all images (gif, png, jpg, etc).

Anyone know how to change the regex...

~src="[^"]*"~

in order to only get files with JPG or JPEG extension?

Thanks a bunch.


Sooner or later the Regex Enforcement Agency will show up. It might as well be me :)

The proper way to do this is with a proper HTML DOM parser. Here's a DOMDocument solution. The usefulness of this is in that it's more robust than parsing the HTML by regex, and also gives you the ability to access or modify other HTML attributes on your <img> nodes at the same time.

$dom = new DOMDocument();
$dom->loadHTML($content);

// To hold all your links...
$links = array();

// Get all images
$imgs = $dom->getElementsByTagName("img");
foreach($imgs as $img) {
  // Check the src attr of each img
  $src = "";
  $src = $img->getAttribute("src");
  if (preg_match("/\.jp[e]?g$/i", $src) {

    // Add it onto your $links array.
    $links[] = $src;
}

See other answers for the simple regex solution, or adapt from the regex inside my foreach loop.


/src="[^"]*\.(jpg|jpeg)"/i

i -> case insensitive match

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号