Is it better to use regex
or Stringtokenizer
to separate the author and title in this string:
William Faulkner - 'Light In August'
Is this the simplest regex
that would work?
Pattern pattern = Pattern.compile("^\\s*([^-]+)-.*$");
Matcher matcher =开发者_如何学JAVA pattern.matcher("William Faulkner - 'Light In August'");
String author = matcher.group(1).trim();
String bookTitle = matcher.group(2).trim();
Is that overkill or is there a simpler way to do this with a Stringtokenizer
?
Basically I'm looking for the most transparent and maintainable solution since I don't have a good understanding of regex
and got help with the one above.
How much control do you have over the input? Can you guarantee that author and title will always be separated by " - "
(a space, a dash, and a space)? Do you know for sure that the author won't contain " - "
? And so on.
If the input is quite rigid, then you can simply use String#split()
, which should make it very clear what you're doing. Don't use a StringTokenizer (source):
StringTokenizer is a legacy class that is retained for compatibility reasons although its use is discouraged in new code. It is recommended that anyone seeking this functionality use the split method of String or the java.util.regex package instead.
Mark Byers' answer shows you how to use split()
.
However, if you have to worry about more variation in the input (e.g., can the whitespace amount of whitespace around the dash be variable or not exist at all?) then using a regex will be terse and concise. The tradeoff then is code readability and clarity of intent.
It depends on what the input looks like. Your regex, for example, would fail on author names that contain a hyphen.
Perhaps something like
Pattern.compile("^\\s*(.*?)\\s+-\\s+'(.*)'\\s*$")
might fit a little better.
How about using String.split
?
String s = "William Faulkner - 'Light In August'";
String[] parts = s.split(" - ", 2);
String author = parts[0];
String title = parts[1];
ideone
One thing to watch out for is that some authors' names and book titles contain hyphens so splitting just on a hyphen won't always work in general.
精彩评论