开发者

Matching a specific tag with a specific attribute without being greedy in vs-2005

开发者 https://www.devze.com 2022-12-15 23:45 出处:网络
I have a bunch of files (in the hundreds) that have img tags like the following: <img randomAttr1=\"randomVal\" randomAttr2=\"valueRand\" border=\"0\"

I have a bunch of files (in the hundreds) that have img tags like the following:

<img randomAttr1="randomVal" randomAttr2="valueRand" border="0" 
     randomAttr3="someRandValue">

I'm trying to do a search and replace operation in Visual Studio 2005 that will identify a tag as an <img>, but only match the border="0" potion of the string.

My belief is that I need a non-greedy portion of the regular expression to "match" (and I use the term loosely) the img tag and then actually match the border attribute, so that I can remove it.

I'm using regular expressions to do this as nearly none of the markup is well formed.

My goal here is to remove the border attributes from all of the img tags.

I've tried the following regex, but I can't seem to get it to match only the border tag:

(\<img)#.@border=\"[0-9]+\"

I believe the '#' and the '@' to be non-greedy matching characters as that is 开发者_运维技巧what the documentation for VS-2005 says, and thus I would not think that it would match so many characters; however it matches everything from the <img all the way to the end of the border="0" attribute.


Try the following: (Tested)

Find: {\<img.#}border=\"[0-9]+\"
Replace: \1

Note that this won't match tags with a newline between the tag and the attribute.


Don't be so quick to give up on real parsers. For example, given near-garbage input of

<TagSoup>lskdjfs
sdfkljs sdfalkjdfs
<img randomAttr1=randomVal randomAttr2="valueRand" border="0" 
     randomAttr3="someRandValue">
sdklfjsdflkj
<img randomAttr1="randomVal" randomAttr2="valueRand123"
     randomAttr3=someRandValue456>

the code below deletes the border attribute.

#! /usr/bin/perl

use warnings;
use strict;

use HTML::Parser;

sub start {
  my($tag,$attr,$attrseq,$text,$skipped) = @_;

  print $skipped;
  unless ($tag eq "img") {
    print $text;
    return;
  }

  my $changed = 0;
  my @seq;
  for (@$attrseq) {
    if (lc($_) eq "border" && $attr->{$_} =~ /^\s*0+\s*$/) {
      delete $attr->{$_};
      $changed = 1;
    }
    else {
      push @seq => $_;
    }
  }

  if ($changed) {
    print "<$tag ",
            join(" " => map qq[$_="$attr->{$_}"], @seq),
          ">";
  }
  else {
    print $text;
  }
}

die "Usage: $0 html-file\n" unless @ARGV == 1;
my $p = HTML::Parser->new(
  api_version => 3,
  marked_sections => 1,
  case_sensitive => 1,
  start_h => [ \&start => "tag, attr, attrseq, text, skipped_text" ],
  end_h => [ sub { print @_ } => "skipped_text, text" ],
);

undef $/;
$p->parse(<>);
0

精彩评论

暂无评论...
验证码 换一张
取 消