开发者

Best way to specify whitespace in a String.Split operation

开发者 https://www.devze.com 2023-03-08 12:26 出处:网络
I am splitting a string based on whitespace as follows: string myStr = \"The quick brown fox jumps over the lazy dog\";

I am splitting a string based on whitespace as follows:

string myStr = "The quick brown fox jumps over the lazy dog";

char[] whitespace = new char[] { ' ', '\t' };
string[] ssizes = myStr.Split(whitespace);

It's irksome to define the char[] array everywhere in my code I want to do this. Is there more efficent way that doesn'开发者_开发技巧t require the creation of the character array (which is prone to error if copied in different places)?


If you just call:

string[] ssize = myStr.Split(null); //Or myStr.Split()

or:

string[] ssize = myStr.Split(new char[0]);

then white-space is assumed to be the splitting character. From the string.Split(char[]) method's documentation page.

If the separator parameter is null or contains no characters, white-space characters are assumed to be the delimiters. White-space characters are defined by the Unicode standard and return true if they are passed to the Char.IsWhiteSpace method.

Always, always, always read the documentation!


Yes, There is need for one more answer here!

All the solutions thus far address the rather limited domain of canonical input, to wit: a single whitespace character between elements (though tip of the hat to @cherno for at least mentioning the problem). But I submit that in all but the most obscure scenarios, splitting all of these should yield identical results:

string myStrA = "The quick brown fox jumps over the lazy dog";
string myStrB = "The  quick  brown  fox  jumps  over  the  lazy  dog";
string myStrC = "The quick brown fox      jumps over the lazy dog";
string myStrD = "   The quick brown fox jumps over the lazy dog";

String.Split (in any of the flavors shown throughout the other answers here) simply does not work well unless you attach the RemoveEmptyEntries option with either of these:

myStr.Split(new char[0], StringSplitOptions.RemoveEmptyEntries)
myStr.Split(new char[] {' ','\t'}, StringSplitOptions.RemoveEmptyEntries)

As the illustration reveals, omitting the option yields four different results (labeled A, B, C, and D) vs. the single result from all four inputs when you use RemoveEmptyEntries:

Best way to specify whitespace in a String.Split operation

Of course, if you don't like using options, just use the regex alternative :-)

Regex.Split(myStr, @"\s+").Where(s => s != string.Empty)


According to the documentation :

If the separator parameter is null or contains no characters, white-space characters are assumed to be the delimiters. White-space characters are defined by the Unicode standard and return true if they are passed to the Char.IsWhiteSpace method.

So just call myStr.Split(); There's no need to pass in anything because separator is a params array.


Why dont you use?:

string[] ssizes = myStr.Split(' ', '\t');


Note that adjacent whitespace will NOT be treated as a single delimiter, even when using String.Split(null). If any of your tokens are separated with multiple spaces or tabs, you'll get empty strings returned in your array.

From the documentation:

Each element of separator defines a separate delimiter character. If two delimiters are adjacent, or a delimiter is found at the beginning or end of this instance, the corresponding array element contains Empty.


So don't copy and paste! Extract a function to do your splitting and reuse it.

public static string[] SplitWhitespace (string input)
{
    char[] whitespace = new char[] { ' ', '\t' };
    return input.Split(whitespace);
}

Code reuse is your friend.


You can use

var FirstString = YourString.Split().First();

to split a string and get its first occurrence before the space.


Can't you do it inline?

var sizes = subject.Split(new char[] { ' ', '\t' });

Otherwise, if you do this exact thing often, you could always create constant or something containing that char array.

As others have noted you can according to the documentation also use null or an empty array. When you do that it will use whitespace characters automatically.

var sizes = subject.Split(null);


Why don't you just do this:

var ssizes = myStr.Split(" \t".ToCharArray());

It seems there is a method String.ToCharArray() in .NET 4.0!

EDIT: As VMAtm has pointed out, the method already existed in .NET 2.0!


If repeating the same code is the issue, write an extension method on the String class that encapsulates the splitting logic.


You can just do:

string myStr = "The quick brown fox jumps over the lazy dog";
string[] ssizes = myStr.Split(' ');

MSDN has more examples and references:

http://msdn.microsoft.com/en-us/library/b873y76a.aspx

0

精彩评论

暂无评论...
验证码 换一张
取 消