开发者

How to replace by regular expression to lowercase in python

开发者 https://www.devze.com 2022-12-27 03:46 出处:网络
I want to search key words (keys would be dynamic) and replace them in a certain format. For example:

I want to search key words (keys would be dynamic) and replace them in a certain format. For example: these data

keys = ["cat", "dog", "mouse"]
text = "Cat dog cat cloud miracle DOG MouSE"

had to be converted to

converted_text = "[Cat](cat) [dog](dog) [cat](cat) cloud miracle [DOG](dog) [MouSE](mouse)"

Here is my code:

keys = "cat|dog|mouse"
p = re.compile(u'\\b(?iu)(?P<name>(%s))\\b' % keys)
converted_text = re.sub(p, '[\g<name>](\g<name>)', text)

And this works fine, only I can't convert last parameter to lower case. This converts like this:

converted_text = "[Cat](cat) [dog](dog) [cat](cat) cloud miracle [DOG](DOG) [MouSE](MouSE开发者_开发技巧)"

how can i convert the last parameter to lowercase? it seems python can't compile the \L sign.


You can use a function to do the replacing:

pattern = re.compile('|'.join(map(re.escape, keys)), re.IGNORECASE)
def format_term(term):
    return '[%s](%s)' % (term, term.lower())

converted_text = pattern.sub(lambda m: format_term(m.group(0)), text)


no need to use regex

>>> keys = ["cat", "dog", "mouse"]
>>> text = "Cat dog cat cloud miracle DOG MouSE"
>>> for w in text.split():
...     if w.lower() in keys:
...        print "[%s]%s" %(w,w.lower()),
...     else:
...        print w,
...
[Cat]cat [dog]dog [cat]cat cloud miracle [DOG]dog [MouSE]mouse


From your proposed solution, I assume I don't need to keep the keys as a list (I'll use a set, to make searching faster). This answer also assumes all words in the text are separated by a space (which I'll use to join them back). Give these, you can use:

>>> keys = (["cat", "dog", "mouse"])
>>> text = "Cat dog cat cloud miracle DOG MouSE"
>>> converted =  " ".join(("[%s](%s)" % (word, word.lower()) if word.lower() in keys else word) for word in text.split())
>>> converted
'[Cat](cat) [dog](dog) [cat](cat) cloud miracle [DOG](dog) [MouSE](mouse)'

Granted, this calls word.lower() twice. You can avoid this (and still use a similar approach) using two list comprehensions (or, actually, generator expressions):

>>> converted =  " ".join(("[%s](%s)" % (word, lower) if lower in keys else word) for word, lower in ((w, w.lower()) for w in text.split()))
>>> converted
'[Cat](cat) [dog](dog) [cat](cat) cloud miracle [DOG](dog) [MouSE](mouse)'
0

精彩评论

暂无评论...
验证码 换一张
取 消