开发者

pattern matching check if greater than symbol is not preceded by smaller than symbol

开发者 https://www.devze.com 2023-03-14 02:49 出处:网络
i would like to check if the greater sign is preceded by the smaller than sign. what i really need is to check i there are more than one word seprated by space between the > and <.

i would like to check if the greater sign is preceded by the smaller than sign. what i really need is to check i there are more than one word seprated by space between the > and <.

for example :

<a v >

should be found because there are more than one "word" inside

and this :

< a > 

should not

here is my python code

text = '<a > b'
if re.search('(?<!\<)[a-zA-Z0-9_ ]+>',text):   # search for '>'
   print "found a match"

for this text i dont want it to match because there is a smaller than sign before. but it does find a match. the Negative Lookbehind does not seem to be working.

solution(kindof): this also finds smaller than symbol that is not preceded by a greater than symbol

match = re.search('<?[a-zA-Z0-9_ ]+>',text)
if ((match) and (match.group(0)[0] != '<')):
   print "found >"
match = re.search('<[a-zA-Z0-9_ ]+>?',text)
if ((match) and (match.group(0)[len(match.group(0)开发者_JAVA技巧)-1] != '>')):
   print "found <"

thanks homson_matt for the solution.

BETTER SOLUTION:

by replacing the string that causes the problem before looking for the greater and smaller symbols.

# replace all templates from source hunk ( <TEMPLATE> )
srcString = re.sub("<[ ]*[a-zA-Z0-9_\*:/\.]+[ ]*>", "TEMPLATE", srcString)
if re.search('[a-zA-Z0-9_ )]>',srcString): # search for '>'
    return True
if re.search('<[a-zA-Z0-9_ (]',srcString): # search for '<'
    return True


The match is: a >. This section matches your regex perfectly - it doesn't start with <, then it's got "a ", which matches the bit in square brackets, and then there's a >.

Are you trying to match the whole string? If you are, try re.match instead of re.search.

Or you might want to try this code. It searches for a substring that might start with <, and then decides if it actually does.

text = '<a > b'
match = re.search('<?[a-zA-Z0-9_ ]+>',text)

if ((match) and (match.group(0)[0] != '<')):
  # Match found


I think this is what you're looking for:

r'<\s*\w+(?:\s+\w+)+\s*>'

\w+ matches the first word, then (?:\s+\w+)+ matches one or more additional words, separated by whitespace. If you don't want the match to span multiple lines, you can change \s to a literal space:

r'< *\w+(?: +\w+)+ *>'

...or to a character class for horizontal whitespace only (i.e., TAB or space characters):

r'<[ \t]*\w+(?:[ \t]+\w+)+[ \t]*>'
0

精彩评论

暂无评论...
验证码 换一张
取 消