i would like to check if the greater sign is preceded by the smaller than sign. what i really need is to check i there are more than one word seprated by space between the > and <.
for example :
<a v >
should be found because there are more than one "word" inside
and this :
< a >
should not
here is my python code
text = '<a > b'
if re.search('(?<!\<)[a-zA-Z0-9_ ]+>',text): # search for '>'
print "found a match"
for this text i dont want it to match because there is a smaller than sign before. but it does find a match. the Negative Lookbehind does not seem to be working.
solution(kindof): this also finds smaller than symbol that is not preceded by a greater than symbol
match = re.search('<?[a-zA-Z0-9_ ]+>',text)
if ((match) and (match.group(0)[0] != '<')):
print "found >"
match = re.search('<[a-zA-Z0-9_ ]+>?',text)
if ((match) and (match.group(0)[len(match.group(0)开发者_JAVA技巧)-1] != '>')):
print "found <"
thanks homson_matt for the solution.
BETTER SOLUTION:
by replacing the string that causes the problem before looking for the greater and smaller symbols.
# replace all templates from source hunk ( <TEMPLATE> )
srcString = re.sub("<[ ]*[a-zA-Z0-9_\*:/\.]+[ ]*>", "TEMPLATE", srcString)
if re.search('[a-zA-Z0-9_ )]>',srcString): # search for '>'
return True
if re.search('<[a-zA-Z0-9_ (]',srcString): # search for '<'
return True
The match is: a >
. This section matches your regex perfectly - it doesn't start with <, then it's got "a ", which matches the bit in square brackets, and then there's a >.
Are you trying to match the whole string? If you are, try re.match
instead of re.search
.
Or you might want to try this code. It searches for a substring that might start with <, and then decides if it actually does.
text = '<a > b'
match = re.search('<?[a-zA-Z0-9_ ]+>',text)
if ((match) and (match.group(0)[0] != '<')):
# Match found
I think this is what you're looking for:
r'<\s*\w+(?:\s+\w+)+\s*>'
\w+
matches the first word, then (?:\s+\w+)+
matches one or more additional words, separated by whitespace. If you don't want the match to span multiple lines, you can change \s
to a literal space:
r'< *\w+(?: +\w+)+ *>'
...or to a character class for horizontal whitespace only (i.e., TAB or space characters):
r'<[ \t]*\w+(?:[ \t]+\w+)+[ \t]*>'
精彩评论