开发者

How to force Python to ignore re.DOTALL in re.findall() statement?

开发者 https://www.devze.com 2022-12-15 16:27 出处:网络
I have been banging my head against the keyboard in search of enlightenment through Google and all Python docs I could get my hands on, but could not find an answer to an issue I\'m encountering.

I have been banging my head against the keyboard in search of enlightenment through Google and all Python docs I could get my hands on, but could not find an answer to an issue I'm encountering.

I have the following regex that I run against a website, but Python insists in setting re.DOTALL on it, even though my code does not tell it to:

\d+. +(?P<season>\d+) *\- *(?P<episode>\d+).*?(?P<day>\d+)(?:\/|\s)+(?P<month>[A-Za-z]+)(?:\/|\s)+(?P<year>\d+) +(?:<a .+><img .+></a>)? ?<a .*?>(?P<name>.*?)</a>

This creates an array of seasons/episodes for TV sho开发者_运维知识库w listings, and it works fine except on epguides.com/BurnNotice (when using the TVRage listings), due to some spacing before newlines (I guess).

Using http://re-try.appspot.com to test, I've narrowed down the issue to the use of re.DOTALL. If I enable it on re-try, it replicates the results I get when I run it standalone on my script. If I untick DOTALL, then it gives me the results I expect.

How can I force Python NOT to use re.DOTALL?

The script runs both on Ubuntu and OS X.


.+> should change to [^>]+> and

.*?> to [^>]*>

You can try replacing others dots into [^\r\n] too, but above 2 changes should be enough.

0

精彩评论

暂无评论...
验证码 换一张
取 消