开发者

Regexp match any uppercase characters except a particular string

开发者 https://www.devze.com 2022-12-09 22:49 出处:网络
I want to match all lines that have any uppercase characters in them but ignoring the string A_ To add to the complication I want to ignore everything after a different string, e.g. an open comment

I want to match all lines that have any uppercase characters in them but ignoring the string A_

To add to the complication I want to ignore everything after a different string, e.g. an open comment

He开发者_运维技巧re are examples of what should and shouldnt match

Matches:

  • fooBar
  • foo Bar foo
  • A_fooBar
  • fooBar /* Comment */

Non Matches (C_ should not trigger a match)

  • A_foobar
  • foo A_bar
  • foobar
  • foo bar foo bar
  • foobar /* Comment */

thanks :)


This should (also?) do it:

(?!A_)[A-Z](?!((?!/\*).)*\*/)

A short explanation:

(?!A_)[A-Z]     # if no 'A_' can be seen, match any uppercase letter
(?!             # start negative look ahead
  ((?!/\*).)    #   if no '/*' can be seen, match any character (except line breaks)
  *             #   match zero or more of the previous match
  \*/           #   match '*/'
)               # end negative look ahead

So, in plain English:

Match any uppercase except 'A_' and also not an uppercase if '*/' can be seen without first encountering '/*'.


My answer:

/([B-Z]|A[^_]|A$)/

I would remove the comment at an earlier stage, if at all possible.

Test:

#!perl
use warnings;
use strict;

my @matches = (
"fooBar",
"foo Bar foo",
"A_fooBar",
"fooBar /* Comment */");

my @nomatches = (
"A_foobar",
"foo A_bar",
"foobar",
"foo bar foo bar",
"foobar /* Comment */");

my $regex = qr/([B-Z]|A[^_]|A$)/;

for my $m (@matches) {
    $m =~ s:/\*.*$::;
    die "FAIL $m" unless $m =~ $regex;
}
for my $m (@nomatches) {
    $m =~ s:/\*.*$::;
    die "FAIL $m" unless $m !~ $regex;
}

Try it: http://codepad.org/EJhWtqkP


Try:

(?<!A_)[a-zA-Z]+

(?!...) is called a negative lookbehind.

As for your specific problem, it's kind of cheating but try:

^([#\.]|(?<!A_))[A-Za-z]{2,}

I get:

fooBar => fooBar
foo Bar foo => foo
A_fooBar (no match)
fooBar /* Comment */ => fooBar
A_foobar (no match)
foo A_bar => foo
foobar => foobar
foo bar foo bar => foo
foobar /* Comment */ => foobar


Does it have to be a single regex? In perl, you could do something like:

if ($string =~ /[A-Z]/ && $string !~ /A_/)

Its not as cool as a single expression with lookback, but its probably easier to read and maintain.


This one does it, although the comment handling isn't extremely robust. (It assumes that a comment is always at the end of the line.)

.*((A(?!_)|([B-Z]))(?<!/\*.*)).*\r\n


Try this:

^(?:[^A-Z/]|A_|/(?!\*))*+[A-Z]

This will work in any flavor that supports possessive quantifiers, e.g. PowerGrep, Java and PHP. The .NET flavor doesn't, but it does support atomic groups:

^(?>(?:[^A-Z/]|A_|/(?!\*))*)[A-Z]

If neither of those features is available, you can use another lookahead to prevent it matching the A_ on the rebound:

^(?:[^A-Z/]|A_|/(?!\*))*(?!A_)[A-Z]
0

精彩评论

暂无评论...
验证码 换一张
取 消