I'm not sure if "token replace" is the right phrase but here is what i'm trying to do:
In a string if I find two or more consecutive white spaces (\s) aka - spaces, new lines, tabs et开发者_StackOverflow社区c. I want to replace whatever it matched with only one instance of itself.
Example:
a b b
would become
a b b
and:
a
b
c
Would become:
a
b
c
Can this be done using .net regex?
You'll need to use this if you want to correctly replace double new-lines as well as spaces:
string input = @"a
b
c d e";
string result = Regex.Replace(input, @"(\r\n|\s)\1", "$1");
The \1
will look for the character(s) matched by the group (\s|\r\n)
, and the $1
in the replacement string will replace the match with just a single instance of the group.
If you want to replace more than one duplicate (i.e. 3 in a row) with a single instance, you'll need to use @"(\r\n|\s)\1+"
as the pattern, but a side effect of this will be:
a
b
c
will be reduced to:
a
b
c
For posterity, my solution from this question:
Regex
regex_select_all_multiple_whitespace_chars =
new Regex(@"\s+",RegexOptions.Compiled);
var cleanString=
regex_select_all_multiple_whitespace_chars.Replace(dirtyString.Trim(), " ");
Regex is NOT the best way to do this. Brute force methods seem to be much faster. Take a read of the link above...
Yes it can. Use System.Text.RegularExpressions.Regex.Replace :
string str = "a b b";
Regex rexReplace = new Regex(" +");
str = rexReplace.Replace(str, new MatchEvaluator(delegate(Match match)
{
return " ";
}));
This is possible with a regex but it gets a but unweildy after adding more than a few choices. Here's a sample of the regex which handles only white space and tabs.
public static string ShrinkWhitespace(string input)
{
return Regex.Replace(input, @"(((?<t>\s)\s+)|((?<t>\t)\t+))", "${t}");
}
I find methods like this are much easier to follow and maintain if they are instead coded as simple methods. For example.
public string ShrinkWhitespace(string input) {
var builder = new StringBuilder();
var i = 0;
while ( i < input.Length ) {
var current = input[i];
builder.Append(current);
switch ( current ) {
case '\t':
case ' ':
case '\n':
i++;
while ( i < input.Length && input[i] == current ) {
i++;
}
break;
default:
i++;
break;
}
}
return builder.ToString();
}
string str = "a b c a\r\n\r\nb\r\n\r\nc";
string newstr = Regex.Replace(str, "(\u0200)+", " ");
newstr = Regex.Replace(newstr, "(\t)+", "\t");
newstr = Regex.Replace(newstr, "(\r\n)+", "\r\n");
精彩评论