开发者

Regex token replace in .NET?

开发者 https://www.devze.com 2023-01-17 14:29 出处:网络
I\'m not sure if \"token replace\" is the right phrase but here is what i\'m trying to do: In a string if I find two or more consecutive white spaces (\\s) aka - spaces, new lines, tabs et开发者_Stac

I'm not sure if "token replace" is the right phrase but here is what i'm trying to do:

In a string if I find two or more consecutive white spaces (\s) aka - spaces, new lines, tabs et开发者_StackOverflow社区c. I want to replace whatever it matched with only one instance of itself.

Example:

a   b   b 

would become

a b b

and:

a


b


c

Would become:

a

b

c

Can this be done using .net regex?


You'll need to use this if you want to correctly replace double new-lines as well as spaces:

string input = @"a


b


c  d  e";

string result = Regex.Replace(input, @"(\r\n|\s)\1", "$1");

The \1 will look for the character(s) matched by the group (\s|\r\n), and the $1 in the replacement string will replace the match with just a single instance of the group.

If you want to replace more than one duplicate (i.e. 3 in a row) with a single instance, you'll need to use @"(\r\n|\s)\1+" as the pattern, but a side effect of this will be:

a


b


c

will be reduced to:

a
b
c


For posterity, my solution from this question:

Regex 
    regex_select_all_multiple_whitespace_chars = 
        new Regex(@"\s+",RegexOptions.Compiled);

var cleanString=
    regex_select_all_multiple_whitespace_chars.Replace(dirtyString.Trim(), " ");

Regex is NOT the best way to do this. Brute force methods seem to be much faster. Take a read of the link above...


Yes it can. Use System.Text.RegularExpressions.Regex.Replace :

string str = "a   b   b";
Regex rexReplace = new Regex(" +");
str = rexReplace.Replace(str, new MatchEvaluator(delegate(Match match)
{
    return " ";
}));


This is possible with a regex but it gets a but unweildy after adding more than a few choices. Here's a sample of the regex which handles only white space and tabs.

public static string ShrinkWhitespace(string input)
{
    return Regex.Replace(input, @"(((?<t>\s)\s+)|((?<t>\t)\t+))", "${t}");
}

I find methods like this are much easier to follow and maintain if they are instead coded as simple methods. For example.

public string ShrinkWhitespace(string input) {
  var builder = new StringBuilder();
  var i = 0; 
  while ( i < input.Length ) {
    var current = input[i];
    builder.Append(current);
    switch ( current ) {
      case '\t':   
      case ' ': 
      case '\n': 
        i++;
        while ( i < input.Length && input[i] == current ) { 
          i++;
        }
        break;
      default:
        i++;
        break;
    }
  }

  return builder.ToString();
}     


string str = "a  b  c       a\r\n\r\nb\r\n\r\nc";

string newstr = Regex.Replace(str, "(\u0200)+", " ");

newstr = Regex.Replace(newstr, "(\t)+", "\t");

newstr = Regex.Replace(newstr, "(\r\n)+", "\r\n");
0

精彩评论

暂无评论...
验证码 换一张
取 消