开发者

Python regex sub

开发者 https://www.devze.com 2023-04-06 03:24 出处:网络
I want to delete all comment. This is my regular expression : re.sub(re.compile(\'<!--.*-->\', re.DOTALL),\'\', text)

I want to delete all comment. This is my regular expression :

re.sub(re.compile('<!--.*-->', re.DOTALL),'', text)

But if my text is :

bzzzzzz <!-- blabla --> blibli <!-- bloblo --> blublu

the result is :

bzzzzzz blublu

instead of :

bzzzzzz blibli blublu

Thanks for yo开发者_Go百科ur help


I'd suggest not to use regex for this kind of stuff. There is always a better solution, such as lxml.html.clean.

Your example:

import lxml.html.clean as clean
cleaner = clean.Cleaner(comments=True)
cleaner.clean_html("bzzzzzz <!-- blabla --> blibli <!-- bloblo --> blublu")
#'bzzzzzz  blibli  blublu'


* is greedy while *? is not

re.sub(re.compile('<!--.*?-->', re.DOTALL), '', text)

or, even shorter:

re.sub('(?s)<!--.*?-->', '', text)
0

精彩评论

暂无评论...
验证码 换一张
取 消