I have been banging my head against this for some time now:
I want to capture all [a-z]+[0-9]?
character sequences excluding strings such as sin|cos|tan
etc.
So having done my regex homework the following regex should work:
(?:(?!(sin|cos|tan)))\b[a-z]+[0-9]?
As you see I am using negative lookahead along with alternation - the \b
after the non-capturing group closing parenthesis is critical to avoid matching the in
of sin
etc. The regex makes sense and as a ma开发者_运维知识库tter of fact I have tried it with RegexBuddy and Java as the target implementation and get the wanted result but it doesn't work using Java Matcher and Pattern objects!
Any thoughts?
cheers
The \b
is in the wrong place. It would be looking for a word boundary that didn't have sin/cos/tan before it. But a boundary just after any of those would have a letter at the end, so it would have to be an end-of-word boundary, which is can't be if the next character is a-z.
Also, the negative lookahead would (if it worked) exclude strings like cost
, which I'm not sure you want if you're just filtering out keywords.
I suggest:
\b(?!sin\b|cos\b|tan\b)[a-z]+[0-9]?\b
Or, more simply, you could just match \b[a-z]+[0-9]?\b
and filter out the strings in the keyword list afterwards. You don't always have to do everything in regex.
So you want [a-z]+[0-9]?
(a sequence of at least one letter, optionally followed by a digit), unless that letter sequence resembles one of sin
cos
tan
?
\b(?!(sin|cos|tan)(?=\d|\b))[a-z]+\d?\b
results:
cos - no match cosy - full match cos1 - no match cosy1 - full match bla9 - full match bla99 - no match
i forgot to escape the \b
for java so \b
should be \\b
and it now works.
cheers
精彩评论