It is a bit pushing the edge, but I have the following situation with this regular expression - "()" : When used to split a string into a string[] array, the results are somewhat weird to me. For example this line of code :
string[] res = new Regex("()").Split("hi!");
sets res
to an array of 9 (!) elements : ["","","h","","i","","!","",""]
I am expecting it to return these 5 elements instead : ["h", "", "i", "", "!" ]. The reason I need this particular result is for compatibility with another regexp library ...
My question is, could this behavior be due to some missing options of the regular expression object or some encoding problem or similar ... Or it is determined in some way and is definitely the correct way it should wor开发者_如何学Gok ? Also, is there a way to force it to return the second (expected) result instead ?
I've indicated the positions where your regex would match by using the |
character:
"|h|i|!|"
Split returns an array whose elements are all either between two adjacent matches, or between the start of the string and the first match, or between the last match and the end of the string. It returns these in the order they occurred in the string. That gives this result:
["","h","i","!",""]
This explains 5 of the 9 array elements.
However, "if capturing parentheses are used in a Regex.Split expression, any captured text is included in the resulting string array." (direct quote from msdn, here: http://msdn.microsoft.com/en-us/library/ze12yx1d.aspx)
In this case, the captured text is the empty string. Since we had 4 matches, this explains the other 4 elements in your result.
Thus, the full result is:
["","","h","","i","","!","",""]
I'd say the nine elements are correct because the expression also matches before "h" and after "!".
To avoid matching at the beginning or end you could add lookahead/behind to make sure there are more characters around the empty match: "(?<=.)()(?=.)"
精彩评论