开发者

Close all HTML unclosed IMG tags

开发者 https://www.devze.com 2022-12-23 05:13 出处:网络
Is it possible to do a regex replace on all IMG tags that are unclosed?If so, how would I identify: <img src=\"...\" alt=\"...\">

Is it possible to do a regex replace on all IMG tags that are unclosed? If so, how would I identify:

  <img src="..." alt="...">

...as a potential canidate to be replaced?

   = <img src="..." alt="..."/>

Update: 开发者_JS百科We have hundreds of pages, and thousands of image tags, all must of which must be closed. I'm not stuck on RegEx -- any other method, aside from manually updating all IMG tags, would suffice.


(<img[^>]+)(?<!/)>

will match an img tag that is not properly closed. It requires that the regex flavor you're using supports lookbehind (which Ruby and JavaScript don't but most others do). Backreference no. 1 will contain the match, so if you search for this regex and replace by \1/> you should be good to go.

If you need to account for the possibility of > inside attributes, you could use

(<img("[^"]*"|[^>])+)(?<!/)>

This will match, e.g.,

<img src="image.gif" alt="hey, look--->">
<img src="image/image.gif">

and leave

<img src="image/image.gif" />

alone.


In HTML the end tag for an <img> "must be omitted", so the start tag closes the element and you can't have an unclosed img.

If you want to convert your HTML to XHTML then use a real parser. Regular Expressions aren't a very good tool for this job.


To replace all unclosed IMG tags :

content = "text<img src='img.jpg'>text<img src='img.png' >text"
content = re.sub('(<img.*?)>', r'\1/>', content, count=0)
print(content)

lookbehind is cool though


What exactly do you mean by "unclosed"?

 <img src="a1.jpg    <--no ending quotes and end parens
 <img src="a1.jpg"   <-- no end parens
 <img src="a1.jpg">  <-- the tag does not self-close as should be done in XHTML

You can try to intelligently find such suspects, but you are never guaranteed to be fool-proof.


I have never tried this but a closed img tag is a tag beginning with <img with stuffs in and a /> at the end.

Here is something I tried in perl

!/usr/bin/env perl

my @images = ('<img src="toto.jpg">',
          '<img src="truc/machin.jpg" title="pouet" >',
          '<img        src="pouet.jpg" alt="toto" />',
          '<img src="math/a-greater-than-b.png" alt="a > b">');

foreach (@images) {
    if (/<img\s+(([a-z]+=".*?")+\s*)>/) {
    print "Match : <img $1 />\n";
    }
}

Produces:

Match : <img src="toto.jpg" />
Match : <img src="truc/machin.jpg" title="pouet"  />
Match : <img src="math/a-greater-than-b.png" alt="a > b" />
0

精彩评论

暂无评论...
验证码 换一张
取 消