开发者

Applying a Regex to a Substring Without using String Slice

开发者 https://www.devze.com 2023-03-11 07:07 出处:网络
I want to search for a regex match in a larger string f开发者_运维技巧rom a certain position onwards, and without using string slices.

I want to search for a regex match in a larger string f开发者_运维技巧rom a certain position onwards, and without using string slices.

My background is that I want to search through a string iteratively for matches of various regex's. A natural solution in Python would be keeping track of the current position within the string and using e.g.

re.match(regex, largeString[pos:])

in a loop. But for really large strings (~ 1MB) string slicing as in largeString[pos:] becomes expensive. I'm looking for a way to get around that.

Side note: Funnily, in a niche of the Python documentation, it talks about an optional pos parameter to the match function (which would be exactly what I want), which is not to be found with the functions themselves :-).


The variants with pos and endpos parameters only exist as members of regular expression objects. Try this:

import re
pattern = re.compile("match here")
input = "don't match here, but do match here"
start = input.find(",")
print pattern.search(input, start).span()

... outputs (25, 35)


The pos keyword is only available in the method versions. For example,

re.match("e+", "eee3", pos=1)

is invalid, but

pattern = re.compile("e+")
pattern.match("eee3", pos=1)

works.


>>> import re
>>> m=re.compile ("(o+)")
>>> m.match("oooo").span()
(0, 4)
>>> m.match("oooo",2).span()
(2, 4)


You could also use positive lookbehinds, like so:

import re

test_string = "abcabdabe"

position=3
a = re.search("(?<=.{" + str(position) + "})ab[a-z]",test_string)

print a.group(0)

yields:

abd
0

精彩评论

暂无评论...
验证码 换一张
取 消