I want to search for a regex match in a larger string f开发者_运维技巧rom a certain position onwards, and without using string slices.
My background is that I want to search through a string iteratively for matches of various regex's. A natural solution in Python would be keeping track of the current position within the string and using e.g.
re.match(regex, largeString[pos:])
in a loop. But for really large strings (~ 1MB) string slicing as in largeString[pos:]
becomes expensive. I'm looking for a way to get around that.
Side note: Funnily, in a niche of the Python documentation, it talks about an optional pos
parameter to the match function (which would be exactly what I want), which is not to be found with the functions themselves :-).
The variants with pos and endpos parameters only exist as members of regular expression objects. Try this:
import re
pattern = re.compile("match here")
input = "don't match here, but do match here"
start = input.find(",")
print pattern.search(input, start).span()
... outputs (25, 35)
The pos
keyword is only available in the method versions. For example,
re.match("e+", "eee3", pos=1)
is invalid, but
pattern = re.compile("e+")
pattern.match("eee3", pos=1)
works.
>>> import re
>>> m=re.compile ("(o+)")
>>> m.match("oooo").span()
(0, 4)
>>> m.match("oooo",2).span()
(2, 4)
You could also use positive lookbehinds, like so:
import re
test_string = "abcabdabe"
position=3
a = re.search("(?<=.{" + str(position) + "})ab[a-z]",test_string)
print a.group(0)
yields:
abd
精彩评论