开发者

Python RegExp exception

开发者 https://www.devze.com 2022-12-29 15:33 出处:网络
How do I split on all nonalphanumeric characters, EXCEPT the apostrophe? re.split(\'\\W+\',text) works, but will also split on apostrophes. How do I add a开发者_JAVA技巧n exception to this rule?

How do I split on all nonalphanumeric characters, EXCEPT the apostrophe?

re.split('\W+',text)

works, but will also split on apostrophes. How do I add a开发者_JAVA技巧n exception to this rule?

Thanks!


Try this:

re.split(r"[^\w']+",text)

Note the w is now lowercase, because it represents all alphanumeric characters (note that that includes the underscore). The character class [^\w'] refers to anything that's not (^) either alphanumeric (\w) or an apostrophe.


re.split(r"[^\w']+",text)

By starting a character class with ^, it inverts the definition, so [^\w'] is the inverse of [\w'], which would match an alphanumeric/underscore/apostrophe.


The answers here don't work, as 'quoted' words will not be stripped of their apostrophes.

What works for me is

re.split(r"\W'+|^'+|'+\W|'$|[^\w']+", text)

i.e. remove:

apostrophe(s) after non-word OR apostrophe(s) at line start OR apostrophe(s) before non-word OR the current solution

0

精彩评论

暂无评论...
验证码 换一张
取 消