I want to validate an input string against a regular expression and then split it.
The input string can be any combination of the letter A and letter A followed by an exclamation mark. For example these are valid input strings: A, A!, AA, AA!, A!A, A!A!, AAA, AAA!, AA!A, A!AA, ... Any other characters should yield an invalid match.
My 开发者_如何学Ccode would probably look something like this:
public string[] SplitString(string s)
{
Regex regex = new Regex(@"...");
if (!regex.IsMatch(s))
{
throw new ArgumentException("Wrong input string!");
}
return regex.Split(s);
}
How should my regex look like?
Edit - some examples:
- input string "AAA", function should return an array of 3 strings ("A", "A", "A")
- input string "A!AAA!", function should return an array of 4 strings ("A!", "A", "A", "A!")
- input string "AA!b", function should throw an ArgumentException
Doesn't seem like a Regex is a good plan here. Have a look at this:
private bool ValidString(string myString)
{
char[] validChars = new char[] { 'A', '!' };
if (!myString.StartsWith("A"))
return false;
if (myString.Contains("!!"))
return false;
foreach (char c in myString)
{
if (!validChars.Contains(c))
return false;
}
return true;
}
private List<string> SplitMyString(string myString)
{
List<string> resultList = new List<string>();
if (ValidString(myString))
{
string resultString = "";
foreach (char c in myString)
{
if (c == 'A')
resultString += c;
if (c == '!')
{
resultString += c;
resultList.Add(string.Copy(resultString));
resultString = "";
}
}
}
return resultList;
}
The reason for Regex not being a good plan is that you can write the logic out in a few simple if-statements that compile and function a lot faster and cheaper. Also Regex isn't so good at repeating patterns for an unlimited length string. You'll either end up writing a long Regex or something illegible.
EDIT
At the end of my code you will either have a List<string>
with the split input string like in your question. Or an empty List<string>
. You can adjust it a little to throw an ArgumentException if that requirement is very important to you. Alternatively you can do a Count
on the list to see if it was successful.
Regex regex = new Regex(@"^(A!|A)+$");
Edit:
Use something like http://gskinner.com/RegExr/ to play with Regular Expressions
Edit after comment:
Ok, you have made it a bit more clear what you want. Don't approach it like that. Because in what you are doing, you cannot expect to match the entire input and then split as it would be the entire input. Either use separate regular expression for the split part, or use groups to get the matched values.
Example:
//Initial match part
Regex regex2 = new Regex(@"(A!)|(A)");
return regex2.Split(s);
And again, regular expressions are not always the answer. See how this might impact your application.
You could try something like:
Regex regex = new Regex(@"^[A!]+$");
((A+!?)+)
Try looking at Espresso http://www.ultrapico.com/Expresso.htm or Rad Software Regular Expression Designer http://www.radsoftware.com.au/regexdesigner/ for designing and testing RE's.
I think I have a solution that satisfies all examples. I've had to break it into two regular expressions (which I don't like)...
public string[] SplitString(string s)
{
Regex regex = new Regex(@"^[A!]+$");
if (!regex.IsMatch(s))
{
throw new ArgumentException("Wrong input string!");
}
return Regex.Split(s, @"(A!?)").Where(x => !string.IsNullOrEmpty(x)).ToArray();
}
Note the use of linq - required to remove the empty matches.
精彩评论