I have a string like this:
s = 'word1 word2 (word3 word4) word5 word6 (word7 word8) word9 word10'
how can I delete everything that is in brackets, so that the output is:
'word1 word2 word5 word6 word9 word10'
I tried regular expression but that doesn't seem to work. Any sugge开发者_开发问答stions?
Best Jacques
import re
s = re.sub(r'\(.*?\)', '', s)
Note that this deletes everything between parentheses only. This means you'll be left with double space between "word2 and word5". Output from my terminal:
>>> re.sub(r'\(.*?\)', '', s)
'word1 word2 word5 word6 word9 word10'
>>> # -------^ -----------^ (Note double spaces there)
However, the output you have provided isn't so. To remove the extra-spaces, you can do something like this:
>>> re.sub(r'\(.*?\)\ *', '', s)
'word1 word2 word5 word6 word9 word10'
My solution is better just because it deletes extra space character ;-)
re.sub( "\s\(.*?\)","",s)
EDIT: You are write, it does not catch all cases. Of course I can write more complex expression trying to take into account more detail:
re.sub( "\s*\(.*?\)\s*"," ",s)
Now result is a desired string or " " if the original string is limited by parentheses and spaces.
You should replace all occurrences of this regex: \([^\)]*\)
with the empty string.
You could go through it character by character. If you keep one string that is the result string, one string that is the discard string, and a boolean of whether or not you're deleting right now.
Then, for each character, if the boolean is true then you add it to the delete string and if it's false then you add it to the real string. If it's an open bracket you add it to the delete string and set the boolean to true; if it's a close bracket you set the delete string to "" and set the boolean to false.
Finally, this leaves you at the end with a delete string IF there was a bracket opened but not closed.
If you want to deal with multiple brackets, use an integer count of how many you've opened but not closed, instead of a boolean.
If the format of your lines are always like the one you show, you probably could try without regexes:
>>> s.replace('(','').replace(')','')
'word1 word2 word3 word4 word5 word6 word7 word8 word9 word10'
This is 4 times faster than regular expresions
>>> t1 = timeit.Timer("s.replace('(','').replace(')','')", "from __main__ import s")
>>> t2 = timeit.Timer("sub(r'\(.*?\)\ *', '', s)", "from __main__ import s; from re import sub")
>>> t1.repeat()
[0.73440917436073505, 0.6970294320000221, 0.69534249907820822]
>>> t2.repeat()
[2.7884134544113408, 2.7414613750137278, 2.7336896241081377]
精彩评论