I need to be able to split an input String by commas, semi-colons or white-space (or a mix of the three). I would also like to treat multiple consecutive delimiters in the input as a single delimiter. Here's what I have so far:
String regex = "[,;\\s]+";
return input.split(regex);
This works, except for when the input string starts with one of the delimiter characters, in which case the first element of the result array is an empty String. I do not want my result to have empty Strings, so that something like, ",,,,ZERO; , ;;ONE ,TWO;," returns just a three element array containing the capital开发者_如何学运维ized Strings.
Is there a better way to do this than stripping out any leading characters that match my reg-ex prior to invoking String.split?
Thanks in advance!
No, there isn't. You can only ignore trailing delimiters by providing 0 as a second parameter to String's split() method:
return input.split(regex, 0);
but for leading delimiters, you'll have to strip them first:
return input.replaceFirst("^"+regex, "").split(regex, 0);
If by "better" you mean higher performance then you might want to try creating a regular expression that matches what you want to match and using Matcher.find
in a loop and pulling out the matches as you find them. This saves modifying the string first. But measure it for yourself to see which is faster for your data.
If by "better" you mean simpler, then no I don't think there is a simpler way than the way you suggested: removing the leading separators before applying the split.
Pretty much all splitting facilities built into the JDK are broken one way or another. You'd be better off using a third-party class such as Splitter, which is both flexible and correct in how it handles empty tokens and whitespaces:
Splitter.on(CharMatcher.anyOf(";,").or(CharMatcher.WHITESPACE))
.omitEmptyStrings()
.split(",,,ZERO;,ONE TWO");
will yield an Iterable<String> containing "ZERO", "ONE", "TWO"
You could also potentially use StringTokenizer to build the list, depending what you need to do with it:
StringTokenizer st = new StringTokenizer(",,,ZERO;,ONE TWO", ",; ", false);
while(st.hasMoreTokens()) {
String str = st.nextToken();
//add to list, process, etc...
}
As a caveat, however, you'll need to define each potential whitespace character separately in the second argument to the constructor.
精彩评论