I'm trying to match pascal string literal input to the following pattern: @"^'([^']|(''))*'$"
, but that's not working. What is wrong with the pattern?
public void Run()
{
using(StreamReader reader = new StreamReader(String.Empty))
{
var LineNumber = 0;
var LineContent = String.Empty;
while(null != (LineContent = reader.ReadLine()))
{
LineNumber++;
String[] InputWords = new Regex(@"\(\*(?:\w|\d)*\*\)").Replace(LineContent.TrimStart(' '), @" ").Split(' ');
foreach(String word in InputWords)
{
Scanner.Scan(word);
}
}
}
}
I search input string for any pascal-comment entry, replace it with whitespace, then I split input into substrings to match them to the following:
private void Initialize()
{
MatchingTable = new Dictionary<TokenUnit.TokenType, Regex>();
MatchingTable[TokenUnit.TokenType.Identifier] = new Regex
(
@"^[_a-zA-Z]\w*$",
RegexOptions.Compiled | RegexOptions.Singleline
);
MatchingTable[TokenUnit.TokenType.NumberLiteral] = new Regex
(
@"(?:^\d+$)|(?:^\d+\.\d*$)|(?:^\d*\.\d+$)",
RegexOptions.Compiled | RegexOptions.Singleline
);
}
// ... Here it all comes together
public TokenUnit Scan(String input)
{
foreach(KeyValuePair<TokenUnit.TokenType, Regex> node in this.MatchingTable)
{
if(node.Value.IsMatch(input))
{
return new TokenUnit
{
Type = node.Key 开发者_如何转开发
};
}
}
return new TokenUnit
{
Type = TokenUnit.TokenType.Unsupported
};
}
The pattern appears to be correct, although it could be simplified:
^'(?:[^']+|'')*'$
Explanation:
^ # Match start of string
' # Match the opening quote
(?: # Match either...
[^']+ # one or more characters except the quote character
| # or
'' # two quote characters (= escaped quote)
)* # any number of times
' # Then match the closing quote
$ # Match end of string
This regex will fail if the input you're checking it against contains anything besides a Pascal string (say, surrounding whitespace).
So if you want to use the regex to find Pascal strings within a larger text corpus, then you need to remove the ^
and $
anchors.
And if you want to allow double quotes, too, then you need to augment the regex:
^(?:'(?:[^']+|'')*'|"(?:[^"]+|"")*")$
In C#:
foundMatch = Regex.IsMatch(subjectString, "^(?:'(?:[^']+|'')*'|\"(?:[^\"]+|\"\")*\")$");
This regex will match strings like
'This matches.'
'This too, even though it ''contains quotes''.'
"Mixed quotes aren't a problem."
''
It won't match strings like
'The quotes aren't balanced or escaped.'
There is something 'before or after' the quotes.
"Even whitespace is a problem."
精彩评论