In the "Advanced Regular Expresssion" chapter in Mastering Perl, I have a broken example for which I can't figure out a nice fix. The example is perhaps trying to be too clever for its own good, but maybe someone can fix it for me. There could be a free copy of the book in it for working fixes. :)
In the section talking about lookarounds, I wanted to use a negative lookbehind to implement a commifying routine for numbers with fractional portions. The point was to use a negative lookbehind because that was the topic.
I stupidly did this:
$_ = '$1234.5678';
s/(?<!\.\d)(?<=\d)(?=(?:\d\d\d)+\b)/,/g; # $1,234.5678
The (?<!\.\d)
asserts that the bit before the (?=(?:\d\d\d)+\b)
is not a decimal point and a digit.
The stupid thing is not trying hard enough to break it. By adding another digit to the end, there is now a group of three digits not preceded by a decimal point and a digit:
$_ = '$1234.56789';
s/(?<!\.\d)(?<=\d)(?=(?:\d\d\d)+\b)/,/g; # $1,234.56,789
If lookbehinds could be variable width in Perl, this would have been really easy. But they can't.
Note that it's easy to do this without a negative lookbehind, but that's not the point of the example. Is th开发者_运维技巧ere a way to salvage this example?
I don't think it's possible without some form of variable-width look-behind. The addition of the \K
assertion in 5.10 provides a way of faking variable-width positive look-behind. What we really need is variable-width negative look-behind but with a little creativity and a lot of ugliness we can make it work:
use 5.010;
$_ = '$1234567890.123456789';
s/(?<!\.)(?:\b|\G)\d+?\K(?=(?:\d\d\d)+\b)/,/g;
say; # $1,234,567,890.123456789
If there was ever a pattern that begged for the /x
notation it's this one:
s/
(?<!\.) # Negative look-behind assertion; we don't want to match
# digits that come after the decimal point.
(?: # Begin a non-capturing group; the contents anchor the \d
# which follows so that the assertion above is applied at
# the correct position.
\b # Either a word boundary (the beginning of the number)...
| # or (because \b won't match at subsequent positions where
# a comma should go)...
\G # the position where the previous match left off.
) # End anchor grouping
\d+? # One or more digits, non-greedily so the match proceeds
# from left to right. A greedy match would proceed from
# right to left, the \G above wouldn't work, and only the
# rightmost comma would get placed.
\K # Keep the preceding stuff; used to fake variable-width
# look-behind
# <- This is what we match! (i.e. a position, no text)
(?= # Begin a positive look-ahead assertion
(?:\d\d\d)+ # A multiple of three digits (3, 6, 9, etc.)
\b # A word (digit) boundary to anchor the triples at the
# end of the number.
) # End positive look-ahead assertion.
/,/xg;
If you have to post on Stack Overflow asking if somebody can figure out how to do this with negative lookbehind, then it's obviously not a good example of negative lookbehind. You'd be better off thinking up a new example rather than trying to salvage this one.
In that spirit, how about an automatic spelling corrector?
s/(?<![Cc])ei/ie/g; # Put I before E except after C
(Obviously, that's not a hard and fast rule in English, but I think it's a more realistic application of negative lookbehind.)
I don't think this is what you are after (especially becaue the negative look-behind assertion has been dropped), but I guess, your only option is to slurp up the decimal places like in this example:
s/
(?:
(?<=\d)
(?=(?:\d\d\d)+\b)
|
( \d{0,3} \. \d+ )
)
/ $1 ? $1 : ',' /exg;
P.S. I think it is a good example when not used as the first one in the book, as it demonstrates some of the pitfalls and limitations of look-around assertions.
精彩评论