I have a string. Let's call it 'test'. I want to test a match for this string, but only using the backref of a regex.
Can I do something like this:
import re
for line in f.readlines():
if '<a href' in line:
if re.match('<a href="(.*)">', line) == 'test':
print 'matched!'
? This of course, doesn't seem to work, but I would think that I might be close? Basically the question is how can I get re t开发者_JS百科o return only the backref for comparison?
re.match
matches only at the beginning of the string.
def url_match(line, url):
match = re.match(r'<a href="(?P<url>[^"]*?)"', line)
return match and match.groupdict()['url'] == url:
example usage:
>>> url_match('<a href="test">', 'test')
True
>>> url_match('<a href="test">', 'te')
False
>>> url_match('this is a <a href="test">', 'test')
False
If the pattern could occur anywhere in the line, use re.search
.
def url_search(line, url):
match = re.search(r'<a href="(?P<url>[^"]*?)"', line)
return match and match.groupdict()['url'] == url:
example usage:
>>> url_search('<a href="test">', 'test')
True
>>> url_search('<a href="test">', 'te')
False
>>> url_search('this is a <a href="test">', 'test')
True
N.B : If you are trying to parsing HTML using a regex, read RegEx match open tags except XHTML self-contained tags before going any further.
精彩评论