I have the following sample expr开发者_JAVA技巧ession that I'm passing to egrep over a word list:
^([a-z])lu([a-z])\2er$
I'd like to further stipulate that the content of \1 and \2 must be different, e.g. this would match "bluffer" but not "blubber". Is there a way to build this into the expression itself (so I can get my results right from egrep or something like it), or am I stuck doing this in some real language with regular expression support and manually checking that none of my groups are the same?
You could add the negative lookahead (?!\1)
in front of the 2nd match group. The following regex:
([a-z])lu(?!\1)([a-z])\2er
matches "bluffer"
but not "blubber"
. This only works properly if both the groups match the same amount of characters.
You need something more powerful. Regular expressions can't track state. Sed could probably do what you need.
精彩评论