I use the following with respect to letters from any language:
text = regex.sub("[^\p{alpha}\d]+"," ",text
Can I use p{alpha}
to convert letters to their lower case equivalent开发者_如何学C if such an equivalency exists? How would this regex look?
>>> re.sub('[AEIOU]+', lambda m: m.group(0).lower(), 'SOME TEXT HERE')
'SoMe TeXT HeRe'
As oxtopus suggested, you can simply convert letters to their lowercase version with text.lower()
(no need for a regular expression). This works with Unicode strings too (À -> à, etc.)
I believe you can find your answer here: http://docs.python.org/library/re.html#re.sub
You can provide a tolower function that takes a match object to the sub method which will return replacement string
You can change the re.findall("([A-Z]+)", text)
to use whatever regex you need. This will just go through the matches, and replace each match with its lowercase equivalent:
text = 'ABCDEF_ghjiklm_OPQRSTUVWXYZ'
for f in re.findall("([A-Z]+)", text):
text = text.replace(f, f.lower())
print text
Output is:
abcdef_ghjiklm_opqrstuvwxyz
in languages like Perl or Js the regex engine supports \L -- python is poor that way
精彩评论