I'm using Python re
to try to make a regular expression which finds all camel cased words not starting with an exclamation point (!).
Here is what I have:
(?<![!])([A-Z]?[a-z]+[A-Z][a-zA-Z]+)
The negative lookbehind assertion is only being applied to the first [A-Z]
block instead of everything within the parenthesis like I expected. Is there anyway to apply the negative lookbehind assertion so that it works on the whole group like I expected?
Also, if that is not possible. Does anyone have any suggestions of what I can do?
I need to match (and eventually replace) all camel cased words. The way I am defining Camel Cased is as follows:
- Any word starting with either a single uppercase letter or a lowercase letter
- One or more lowercase letters
- An uppercase letter
- One or more lowercase letters
In other words, any word starting with only one uppercase letter followed by one or more lowercase letters followed by an uppercase letter followed by one or more lowercase letters.
All that is easy to match, the problem becomes apparent when I need to check if it starts with an exclamation point (!). The goal is to find all words not starting with that symbol.
Example:
- The regular expression should match:
HelloWorld
- The regular expression should not match:
!HelloWorld
In a sentence like this: "Welcome to MyWorld! We have !CoolStuff here!" I should be able to extract MyWorld, but not CoolStuff
Thanks for your help, -Sunjay03
[EDIT:] Here is a string where it does not work:
"This is an example of !HelloWorld. Click that link FOO! Also, check out my iPods"
The regular expression extracts the following:
['elloWorld', 'iPods']
Solution: (?<![!])\b([A-Z]?[a-z]+[A-Z][a-zA-Z]+)
Th开发者_开发百科anks to JBernardo for his tip. This solution works because it looks for any word boundary excluding the exclamation point.
re.findall(r'(?<![!])\b\w+', ' !Hai Yo!')
And the result is ['Yo']
BTW, just change the \w+
with your validation but keep the \b
.
Looks like the following will meet your requirement,
>>> reg=r'[^!]\b([a-zA-Z][a-z]+[A-Z][a-zA-Z]+)\b'
>>> text="Welcome to MyWorld! We have !CoolStuff here YouAgree?"
>>> re.findall(reg, text)
['MyWorld', 'YouAgree']
>>>
精彩评论