I have a file I need to parse. The parsing is built incrementally, such that on each iteration the expressions becomes more case specific.
The code segment which overloads the system looks roughly like this:
for item in ret:
pat = r'a\sstyle=".+开发者_Go百科class="VEAPI_Pushpin"\sid="msftve(.+?)".+>%s<'%item[1]
r=re.compile(pat, re.DOTALL)
match = r.findall(f)
The file is a rather large HTML file (parsed from bing maps), and each answer must match its exact id.
Before appying this change the workflow was very good. Is there anything I can do to avoid this? Or to optimize the code?
My only guess is that you are getting too many matches and running out of memory. Though this doesn't seem very reasonable, it might be the case. Try using finditer instead of findall to get one match at a time without creating a monster list of matches. If that doesn't fix your problem, you might have stumbled on a more serious bug in the re module.
精彩评论