The input is a string representing a list of elements.
A list is defined as an open curly {
followed by 0 or more elements separated by whitespace followed by a closed curly }
.
An element is either a literal or a list of elements.
A literal is a succession of non-whitespace characters. If an element contains a curly bracket, it must be escaped with a backslash : \{
and \}
. (Or you could assume curlies are not allowed inside literals, for simplicity)
Example:
"{abc { def ghi } 7 { 1 {2} {3 4} } {5 6} x\{yz \}foo }"
No curlies inside literals:
"{abc { def ghi } 7 { 1 {2} {3 4} } {5 6} xyz foo }"
(This is a simplified definition of a Tcl list.)
What I want to know is: can the input be split into the elements of the outermost loop using regex?
Expected output:
abc
{ def ghi }
7
{ 1 {2} {3 4} }
{5 6}
x{yz
}foo
The real qu开发者_Python百科estion is: can this be done with a Regex?
I'm most interested in the .NET flavour, but will accept any answers.
I'll post my own assumption in an answer, and see if it's validated or destroyed.
Unfortunately the answer is YES for some flavor of Regex, e.g. PCRE and .NET because they support recursive pattern and stack-like operations respectively.
The grammar can be written as
ELEMENT -> (?!\{)\S+ | LIST
LIST -> '\{\s*' ELEMENT? ('\s+' ELEMENT)* '\s*\}'
thus in PCRE, this can be transformed into the pattern:
\{\s*(?0)?(?:\s+(?0))*\s*\}|(?!\{)(?:[^\s}]|\}(?![\s}]))+
# --------------------------- ^^^^^^^^^
# LIST Make sure the } is not closing the group
See http://www.ideone.com/SnGsU for example (I have stripped the top-level {
and }
for simplicity).
(Of course, don't try this at work :) )
(BTW, I don't know how to transform this PCRE into .NET flavor. If someone knows, please try Converting PCRE recursive regex pattern to .NET balancing groups definition)
Well, the edit removes curly braces from tokens and takes the sting from the question, and now it is easily doable with .Net Regexes, using balancing groups. It is simply matching braces, which is a basic example.
Much like KennyTM's answer, this will only work if you remove the top level braces, or it will match the whole input.
Again, this is better used for recreational purposes:
(?: # try matching...
(?:\\[{}]|[^\s{}])+\s*? # a literal (allow escaped curly braces)
| # OR
(?<Curly>{)\s* # "{" and push to stack
| # OR
(?<-Curly>})\s*? # "}", pop from stack and fail if the stack is empty
)+? # ...a few times, and stop whenever you can.
(?(Curly)(?!)) # Make sure there aren't any extra open curly braces
For much more details see this article: Regex Balancing Group in Depth
The traditional answer to this is a resounding "NO". As we have learned in the compilers class, a regular grammar can't be used to describe a language with a recursive definition (i.e. you can't use a finite state machine)
What's needed here is a context free parser, the implementation of which boils down to a finite state machine + a STACK.
See ANTLR, bison etc.
@Cristi is right about the regex: It is theoretically impossible to solve recursive expressions using a stackless, finite-state automaton. The solution, however, is simpler: you only need to keep a counter of the number of open parentheses, and make sure it doesn't drop below 0. It is more memory-saving than maintaining a stack, and you only need the count - not the contents - of the Parentheses.
Algorithm:
counter = 0 // Number of open parens
For char c in string:
print c
if c=='{': // Keep track on number of open parens
counter++
if c=='}':
counter--
if counter==1: // New line if we're back to the top level
print "\n"
elif counter<1: // Error if the nesting is malformed
print "ERROR: parentheses mismatch"
break
精彩评论