开发者

Regex enclosed with () Issue

开发者 https://www.devze.com 2023-02-03 23:28 出处:网络
I\'m having a regular expression (\\\\w+[ ]*|-\\\\w+[ ]*)(!=|<=|>=|=|<|>| not in | in | not like | lik开发者_如何学JAVAe )(.*)

I'm having a regular expression

(\\w+[ ]*|-\\w+[ ]*)(!=|<=|>=|=|<|>| not in | in | not like | lik开发者_如何学JAVAe )(.*)

This has 3 sections sepearted by comma.

When I try to match this against something like

product(getProduct_abc) in (Xyz)

It's not matching the regex.

But when I try to match

100=product(getProduct_abc) in (Xyz)

it matches Perfectly.

What's wrong with the regex?


Nothing is wrong, per se, with the regular expression. It just does not match the specified string.

You need to find yourself a good reference on regular expressions and learn the basics. One is http://www.regular-expressions.info/ . That one may or may not be a good reference to you as a beginner. (I am using his RegexBuddy tool to test your regular expression.)

Here is a rough breakdown of the expression:

  • There are three capturing groups, each surrounded by their own pair of parentheses. (Note that parentheses, along with many other characters, have a special meaning in regular expressions, so to match a parenthesis of either direction you need to escape it. The given regular expression does not do this.)
  • In the first capturing group there are two possible choices for a match. They are:
    • One or more "word" characters followed by zero or more spaces, or
    • A dash, followed by one or more "word" characters followed by zero or more spaces
  • In the second capturing group, there are 10 possible matches: the listed operator symbols (without surrounding spaces), or the listed textual operators (with surrounding spaces)
  • In the third capturing group, zero or more of any character at all will match.

The string 'product(getProduct_abc) in (Xyz)' fails to match because prior to the 'in' operator there are more than just "word" characters. The parentheses are not considered "word" characters, and therefore cause the match to fail.

The second string ('100=product(getProduct_abc) in (Xyz)') matches because it uses the equals ('=') as the matched operator for the second capturing group, '100' is a string of all "word" characters, and everything after the '=' matches the "any character" part, so the match succeeds. Note that depending on how the end-of-string is handled, some languages might not match even that string if it is at the very end of the string.

If the first string is supposed to match, then you need to check with your business users. Maybe they are beginners with regular expressions too, and gave you one that doesn't work. ;-)


This is what I see:

'100=product(getProduct_abc) in (Xyz)'  
Group1 match = '100'  
Group2 match = '='  
Group3 match = 'product(getProduct_abc) in (Xyz)'  

'product(getProduct_abc) in (Xyz)'  
        ^  
    Fails here on Group1 match because parenthesis are not included in this group  

You can fix the situation by forcing the last occurance of group 1,2,3 match in the string.
Fixing/Rewriting the equavalent Group1 match and separating the groups, they can be re-combined to force the last match possible.

rxP1 = '(?:-?[\w()]+\ *)';
rxP2 = '(?:!=|<=|>=|=|<|>| not in | in | not like | like )';
rxP3 = '(?:.*?)';

rxAll = /(?:$rxP1$rxP2$rxP3)*($rxP1)($rxP2)($rxP3)$/;

In Perl:

use strict;
use warnings;

my @samples = (
 'product(getProduct_abc) in (Xyz1)',
 '100=product(getProduct_abc) in (Xyz2)',
 '100 like = != not like >product(getProduct_abc) in (Xyz3)',
);

my $rxP1 = '(?:-?[\w()]+\ *)';
my $rxP2 = '(?:!=|<=|>=|=|<|>| not in | in | not like | like )';
my $rxP3 = '(?:.*?)';

for (@samples)
{
    if ( /(?:$rxP1$rxP2$rxP3)*($rxP1)($rxP2)($rxP3)$/ ) {
        print "\n1 = '$1'\n";
        print "2 = '$2'\n";
        print "3 = '$3'\n";
    }
}

Output:

1 = 'product(getProduct_abc)'
2 = ' in '
3 = '(Xyz1)'

1 = 'product(getProduct_abc)'
2 = ' in '
3 = '(Xyz2)'

1 = 'product(getProduct_abc)'
2 = ' in '
3 = '(Xyz3)'
0

精彩评论

暂无评论...
验证码 换一张
取 消