开发者

Looking for a regex that match all words, except the ones [inside brackets]

开发者 https://www.devze.com 2022-12-09 18:24 出处:网络
I\'m trying to write a regular expression that matches all word inside a specific string, but skips words inside brackets. I currently have one regex that matches all words:

I'm trying to write a regular expression that matches all word inside a specific string, but skips words inside brackets. I currently have one regex that matches all words:

/[a-z0-9]+(-[a-z0-9]+)*/i

I also have a regex that matches all words inside brackets:

/\[(.*)\]/i

I basically want to match everything that the first regex matches, but without everything the second regex matches.

Sample input text: http://gist.github.com/222857 It should match every word separately, without the one in the brackets.

Any hel开发者_运维问答p is appreciated. Thanks!


Perhaps you could do it in two steps:

  1. Remove all the text within brackets.
  2. Use a regular expression to match the remaining words.

Using a single regular expression to try to do both these things will end up being more complicated than it needs to be.


How 'bout this:

your_text.scan(/\[.*\]|([a-z0-9]+(?:-[a-z0-9]+)*)/i) - [[nil]]


Which Ruby version are you using? If it's 1.9 or later, this should do what you want:

/(?<![\[a-z0-9-])[a-z0-9]+(-[a-z0-9]+)*(?![\]a-z0-9-])/i


I don't think I understand the question properly. Why not just make a new string that does not contain the second regex like so:

string1 =~ s/\[(.*)\]//g

Off the top of my head won't that match what you deleted while storing the result in string1? I have not tested this yet though. I might test it later.


I agree with Shhnap. Without more info, it sounds like the easiest way is to remove what you don't want. but it needs to be /[(.*?)]/ instead. After that you can split on \s.

If you are trying to iterate through each word, and you want each word to match maybe you can cheat a little with: string.split(/\W+/) .You will lose the quotations and what not, but you get each word.


This seems to work:

[^\[][a-z0-9]+(-[a-z0-9]+)*

if the first letter of a word is an opening bracket, it doesnt match it.

btw, is there a reason why you are capturing the words with dashes in them? If no need for that, your regex could be simplified.

0

精彩评论

暂无评论...
验证码 换一张
取 消