开发者

Regex for removing whitespace

开发者 https://www.devze.com 2022-12-17 03:34 出处:网络
def remove_whitespaces(value): \"Remove all whitespaces\" p = re.compile(r\'\\s+\') return p.sub(\' \'开发者_运维问答, value)
def remove_whitespaces(value):
    "Remove all whitespaces"
    p = re.compile(r'\s+')
    return p.sub(' '开发者_运维问答, value)

The above code strips tags but doesn't remove "all" whitespaces from the value.

Thanks


The fastest general approach eschews REs in favor of string's fast, powerful .translate method:

import string
identity = string.maketrans('', '')

def remove_whitespace(value):
  return value.translate(identity, string.whitespace)

In 2.6, it's even simpler, just

  return value.translate(None, string.whitespace)

Note that this applies to "plain" Python 2.* strings, i.e., bytestrings -- Unicode's strings' .translate method is somewhat different -- it takes a single argument which must be a mapping of ord values for Unicode characters to Unicode strings, or None for deletion. I.e., taking advantage of dict's handy .fromkeys classmethod:

nospace = dict.fromkeys(ord(c) for c in string.whitespace)

def unicode_remove_whitespace(value):
  return value.translate(nospace)

to remove exactly the same set of characters. Of course, Unicode also has more characters you could consider whitespace and want to remove -- so you'd probably want to build a mapping unicode_nospace based on information from module unicodedata, rather than using this simpler approach based on module string.


p.sub(' ', value)

should be

p.sub('', value)

The former replaces all whitespace with a single space, the latter replaces with nothing.


Maybe value.join(p.split()) ''.join(value.split()) could work for you?


re.sub('\s*', '', value) should also work!


re.sub(r'\s', '', value) function works well for me, in this case.


@OP, compile your regex pattern outside, so you don't have to call re.compile everytime you use the procedure. Also you are substituting back to one space, that is not removing spaces is it?

p = re.compile(r'\s+')
def remove_whitespaces(p,value):
    "Remove all whitespaces"    
    return p.sub('', value)

lastly, another method not using regex is to just split on whitespaces and joining them up again

def remove_whitespaces(value):
    "Remove all whitespaces"    
    return ''.join(value.split())
0

精彩评论

暂无评论...
验证码 换一张
取 消