Close all HTML unclosed IMG tags_问答_开发者_运维开发者技术经验分享

开发者 https://www.devze.com 2022-12-23 05:13 出处：网络

Is it possible to do a regex replace on all IMG tags that are unclosed?If so, how would I identify: <img src=\"...\" alt=\"...\">

相关专题：regex xhtml

Is it possible to do a regex replace on all IMG tags that are unclosed? If so, how would I identify:

  <img src="..." alt="...">

...as a potential canidate to be replaced?

   = <img src="..." alt="..."/>

Update: 开发者_JS百科We have hundreds of pages, and thousands of image tags, all must of which must be closed. I'm not stuck on RegEx -- any other method, aside from manually updating all IMG tags, would suffice.

(<img[^>]+)(?<!/)>

will match an img tag that is not properly closed. It requires that the regex flavor you're using supports lookbehind (which Ruby and JavaScript don't but most others do). Backreference no. 1 will contain the match, so if you search for this regex and replace by \1/> you should be good to go.

If you need to account for the possibility of > inside attributes, you could use

(<img("[^"]*"|[^>])+)(?<!/)>

This will match, e.g.,

<img src="image.gif" alt="hey, look--->">
<img src="image/image.gif">

and leave

<img src="image/image.gif" />

alone.

In HTML the end tag for an <img> "must be omitted", so the start tag closes the element and you can't have an unclosed img.

If you want to convert your HTML to XHTML then use a real parser. Regular Expressions aren't a very good tool for this job.

To replace all unclosed IMG tags :

content = "text<img src='img.jpg'>text<img src='img.png' >text"
content = re.sub('(<img.*?)>', r'\1/>', content, count=0)
print(content)

lookbehind is cool though

What exactly do you mean by "unclosed"?

 <img src="a1.jpg    <--no ending quotes and end parens
 <img src="a1.jpg"   <-- no end parens
 <img src="a1.jpg">  <-- the tag does not self-close as should be done in XHTML

You can try to intelligently find such suspects, but you are never guaranteed to be fool-proof.

I have never tried this but a closed img tag is a tag beginning with <img with stuffs in and a /> at the end.

Here is something I tried in perl

!/usr/bin/env perl

my @images = ('<img src="toto.jpg">',
          '<img src="truc/machin.jpg" title="pouet" >',
          '<img        src="pouet.jpg" alt="toto" />',
          '<img src="math/a-greater-than-b.png" alt="a > b">');

foreach (@images) {
    if (/<img\s+(([a-z]+=".*?")+\s*)>/) {
    print "Match : <img $1 />\n";
    }
}

Produces:

Match : <img src="toto.jpg" />
Match : <img src="truc/machin.jpg" title="pouet"  />
Match : <img src="math/a-greater-than-b.png" alt="a > b" />

Close all HTML unclosed IMG tags

精彩评论

关注公众号

热门标签

图文推荐

Close all HTML unclosed IMG tags

更多 问答 相关资讯：

精彩评论

关注公众号

热门标签

图文推荐

更多问答相关资讯：