I'm trying to write a regular expression that matches all word inside a specific string, but skips words inside brackets. I currently have one regex that matches all words:
/[a-z0-9]+(-[a-z0-9]+)*/i
I also have a regex that matches all words inside brackets:
/\[(.*)\]/i
I basically want to match everything that the first regex matches, but without everything the second regex matches.
Sample input text: http://gist.github.com/222857 It should match every word separately, without the one in the brackets.
Any hel开发者_运维问答p is appreciated. Thanks!
Perhaps you could do it in two steps:
- Remove all the text within brackets.
- Use a regular expression to match the remaining words.
Using a single regular expression to try to do both these things will end up being more complicated than it needs to be.
How 'bout this:
your_text.scan(/\[.*\]|([a-z0-9]+(?:-[a-z0-9]+)*)/i) - [[nil]]
Which Ruby version are you using? If it's 1.9 or later, this should do what you want:
/(?<![\[a-z0-9-])[a-z0-9]+(-[a-z0-9]+)*(?![\]a-z0-9-])/i
I don't think I understand the question properly. Why not just make a new string that does not contain the second regex like so:
string1 =~ s/\[(.*)\]//g
Off the top of my head won't that match what you deleted while storing the result in string1? I have not tested this yet though. I might test it later.
I agree with Shhnap. Without more info, it sounds like the easiest way is to remove what you don't want. but it needs to be /[(.*?)]/ instead. After that you can split on \s.
If you are trying to iterate through each word, and you want each word to match maybe you can cheat a little with: string.split(/\W+/) .You will lose the quotations and what not, but you get each word.
This seems to work:
[^\[][a-z0-9]+(-[a-z0-9]+)*
if the first letter of a word is an opening bracket, it doesnt match it.
btw, is there a reason why you are capturing the words with dashes in them? If no need for that, your regex could be simplified.
精彩评论