开发者

regular expression to remove substrings delimited by matching double braces

开发者 https://www.devze.com 2022-12-20 18:32 出处:网络
I have a string like this: adfs开发者_如何学编程df dsf{{sadfsdfadf {{Infobox}} musical}} jljlk }}

I have a string like this:

adfs开发者_如何学编程df dsf  {{sadfsdfadf {{Infobox}} musical}} jljlk }}

I want eliminate all {{..}} substrings. I tried

\{\{.*\}\}

which eliminates {{sadfsdfadf{{Infobox}} musical}} jljlk }} but I want eliminate {{sadfsdfadf {{Infobox}} musical}}, checking the }} closer to the start of the substring.

How can I do this?


Use a lazy quantifier:

\{\{.*?\}\}


Here's a fairly non-robust expression \{\{[a-zA-Z\s]*\}\} that will work.


In the general case, this won't be possible with regular expressions. You cannot match balanced parentheses, or anything like that, with a regular expression-- you need a context-free grammar instead.

That said, Perl has some facilities for recursive regular expressions; these would allow you to do what you want. I do not know if Ruby is capable of doing the same thing.


Here is a quick example using a recent 1.9.x Ruby version. If you run an 1.8.x release you'll need the oniguruma gem. This doesn't take into account escaped \{\{ but does handle single { and } which I assume you will want to ignore.

#!/usr/bin/evn ruby
# Old 1.8.x versions of Ruby you'll need the gem.
# require 'oniguruma'
require 'pp'

squiggly = %r/
  (
    (?<squiggly>         # squiggly named group
      \{\{               # start {{
        (?:              # non matching group
          [^{}]          # anything not { or }
          | \{[^{]       # any { not followed by {
          | \}[^}]       # any } not followed by }
          | \g<squiggly> # nested squiggly
        )*               # zero or more times
      \}\}               # end }}
    )                    # end of squiggly
  )/x

string = 'adfsdf dsf  {{sadfsdfadf {{Infobox}} musical}} jljlk }}'
pp squiggly.match(string)[:squiggly] #=> {{sadfsdfadf {{Infobox}} musical}}
0

精彩评论

暂无评论...
验证码 换一张
取 消