I want to remove "un-partnered" parentheses from a string.
I.e., all (
's should be removed unless they're followed by a )
somewhere in the string. Likewise, all )
's not preceded by a (
somewhere in the string should be removed.
Ideally the algorithm would take into account nesting as well.
E.g.:
"(a)".remove_unmatched_parents # => "(a)"
"a(".remove_unmatched_parent开发者_如何学Pythons # => "a"
")a(".remove_unmatched_parents # => "a"
Instead of a regex, consider a push-down automata, perhaps. (I'm not sure if Ruby regular expressions can handle this, I believe Perl's can).
A (very trivialized) process may be:
For each character in the input string:
- If it is not a '(' or ')' then just append it to the output
- If it is a '(' increase a seen_parens counter and add it
- If it is a ')' and seen_parens is > 0, add it and decrease seen_parens. Otherwise skip it.
At the end of the process, if seen_parens is > 0 then remove that many parens, starting from the end. (This step can be merged into the above process with use of a stack or recursion.)
The entire process is O(n)
, even if a relatively high overhead
Happy coding.
The following uses oniguruma. Oniguruma is the regex engine built in if you are using ruby1.9. If you are using ruby1.8, see this: oniguruma.
Update
I had been so lazy to just copy and paste someone else's regex. It seemed to have problem.
So now, I wrote my own. I believe it should work now.
class String
NonParenChar = /[^\(\)]/
def remove_unmatched_parens
self[/
(?:
(?<balanced>
\(
(?:\g<balanced>|#{NonParenChar})*
\)
)
|#{NonParenChar}
)+
/x]
end
end
(?<name>regex1)
names the (sub)regexregex1
asname
, and makes it possible to be called.?g<name>
will be a subregex that representsregex1
. Note here that?g<name>
does not represent a particular string that matchedregex1
, but it representsregex1
itself. In fact, it is possible to embed?g<name>
within(?<name>...)
.
Update 2
This is simpler.
class String
def remove_unmatched_parens
self[/
(?<valid>
\(\g<valid>*\)
|[^()]
)+
/x]
end
end
Build a simple LR parser:
tokenize, token, stack = false, "", []
")(a))(()(asdf)(".each_char do |c|
case c
when '('
tokenize = true
token = c
when ')'
if tokenize
token << c
stack << token
end
tokenize = false
when /\w/
token << c if tokenize
end
end
result = stack.join
puts result
running yields:
wesbailey@feynman:~/code_katas> ruby test.rb
(a)()(asdf)
I don't agree with the folks modifying the String class because you should never open a standard class. Regexs are pretty brittle for parser and hard to support. I couldn't imagine coming back to the previous solutions 6 months for now and trying to remember what they were doing!
Here's my solution, based on @pst's algorithm:
class String
def remove_unmatched_parens
scanner = StringScanner.new(dup)
output = ''
paren_depth = 0
while char = scanner.get_byte
if char == "("
paren_depth += 1
output << char
elsif char == ")"
output << char and paren_depth -= 1 if paren_depth > 0
else
output << char
end
end
paren_depth.times{ output.reverse!.sub!('(', '').reverse! }
output
end
end
Algorithm:
- Traverse through the given string.
- While doing that, keep track of "(" positions in a stack.
- If any ")" found, remove the top element from the stack.
- If stack is empty, remove the ")" from the string.
- In the end, we can have positions of unmatched braces, if any.
Java code: Present @ http://a2ajp.blogspot.in/2014/10/remove-unmatched-parenthesis-from-given.html
精彩评论