What is a Perl regex for finding the first non-consecutively-repeating character in a string?_问答_开发者

What is a Perl regex for finding the first non-consecutively-repeating character in a string?

开发者 https://www.devze.com 2022-12-24 23:45 出处：网络

相关专题：perl regex

Your task, should you choose to accept it, is to write a Perl regular expression that for a given string, will return the first occurrence of a character that is not consecutively duplicated. In other words, both preceded AND succeeded by characters different from itself (or start/end of string respectively).

Example:

IN: aabbcdecc
OUT: c

Please note that "not consecutively duplicated" does not mean "anywhere in the string".

NOTE: it must be a pure regex expr开发者_运维知识库ession. E.g. the solution that obviously comes to mind (clone the string, delete all the duplicates, and print the first remaining character) does not count, although it solves the problem.

The question is inspired by my somewhat off-topic answer to this: How can I find the first non-repeating character in a string using Perl?

(?:(.)\1+)*(.?)

Get the 2nd capture. (Will return an empty string if every character is consecutively duplicated.)

Test cases:

~:2434$ perl -e "\"abc\" =~ m/(?:(.)\1+)*(.?)/; print \$2;"
a
~:2435$ perl -e "\"aabbcc\" =~ m/(?:(.)\1+)*(.?)/; print \$2;"

~:2436$ perl -e "\"aabbc\" =~ m/(?:(.)\1+)*(.?)/; print \$2;"
c
~:2437$ perl -e "\"aabcc\" =~ m/(?:(.)\1+)*(.?)/; print \$2;"
b
~:2438$ perl -e "\"aabcbbbcccccc\" =~ m/(?:(.)\1+)*(.?)/; print \$2;"
b
~:2439$ perl -e "\"aabbvbbcccccc\" =~ m/(?:(.)\1+)*(.?)/; print \$2;"
v
~:2440$ perl -e "\"aabbcdecc\" =~ m/(?:(.)\1+)*(.?)/; print \$2;"
c
~:2441$ perl -e "\"aabbccddeef\" =~ m/(?:(.)\1+)*(.?)/; print \$2;"
f
~:2442$ perl -e "\"faabbccddeef\" =~ m/(?:(.)\1+)*(.?)/; print \$2;"
f
~:2443$ perl -e "\"faabbccddeefax\" =~ m/(?:(.)\1+)*(.?)/; print \$2;"
f
~:2444$ perl -e "\"xfaabbccddeefx\" =~ m/(?:(.)\1+)*(.?)/; print \$2;"
x
~:2445$ perl -e "\"xabcdefghai\" =~ m/(?:(.)\1+)*(.?)/; print \$2;"
x
~:2446$ perl -e "\"cccdddeeea12345\" =~ m/(?:(.)\1+)*(.?)/; print \$2;"
a
~:2447$ perl -e "\"1234a5678a23\" =~ m/(?:(.)\1+)*(.?)/; print \$2;"
1

Or (will not match if every character is consecutively duplicated.)

(?:^|(.)(?!\1))(.)(?!\2)

use 5.010;
$str=~/^(([a-z])\g{-1}+)*(?<c>[a-z])/i;
$char = $+{c};

I wish Perl had a regex negate flag! ie, return all the characters that do NOT match /regex/

What you are looking for is really the regex capture complement of:

m/(.)(\1)+/

I tried all the suggestions on this page against Brian's data list (the result of in his program listing). None work completely.

The regex of:

(?:^|(.)(?!\1))(.)(?!\2)

fails to match the beginning 'f' in line 2 and 3. Brian's does not match the 'f' at the beginning of line 2 and 3 or any of the singletons at the end of line 5.

The regex of:

$str=~/^(([a-z])\g{-1}+)*(?<c>[a-z])/i;
$char = $+{c};

does work.

The only single regex that I found is a simple one:

#!/usr/bin/perl
while( <DATA> ) {
    chomp;
    print "BEFORE: $_\n";
    s/(.)(\1)+//g;
    print "AFTER: $_\n";
    print "charater: " . substr($_,0,1) . "\n\n";
 }
__END__
aabbccddeef
faabbccddeef
faabbccddeefax
xfaabbccddeefx
xabcdefghai
cccdddeeea12345
1234a5678a23
aabbcdecc
abcdefg
aabbccddeef
cccdddeeea12345

This works in the simple case of 'give the first character.' ((edit: reread: sorry, I now read that the obvious delete the doubles was not what you were looking for...))

Love to hear if there is a better solution.