Need regex help in PHP 5_问答_开发者_运维开发者技术经验分享

开发者 https://www.devze.com 2023-02-12 23:50 出处：网络

Ok. Admittedly, I am not the best at working with regular expressions. What I am doing is a screen scrape, then trying to fix the img src values in the embedded images to point back to the original domain. This is the regex I have been trying variations of (too many to list - here's the current one):

preg_match_all('/<img\b[^>]*>/i', $html, $images);

What this ends up doing is to replace all < with />. What I need it to do is just return the (currently) five images on the page in an array so that I can work with those to fix their src values, the开发者_运维知识库n write them back to $html, which is set at the beginning of the file:

$html = file_get_contents($target_url);

Basically, don't do this with regex. You can parse HTML with regex, but it is almost certainly not worth the effort.

Do it with genuine DOM parsing instead, using the DOMDocument class:

$dom = new DOMDocument;
$dom->loadHTML($html);
$images = $dom->getElementsByTagName('img');
foreach ($images as $image) {
    $image->setAttribute('src', 'http://example.com/' . $image->getAttribute('src'));
}
$html = $dom->saveHTML();