What method would you suggest to开发者_运维知识库 normalizing a text in Java, for example
String raw = " This is\n a test\n\r ";
String txt = normalize(raw);
assert txt == "This is a test";
I'm thinking about StringUtils
.replace()
and .strip()
methods, but maybe there is some easier way.
Try the following if it is just a matter of whitespaces
String txt = raw.replaceAll("\\s+", " ").trim();
I see that you have a newline actually in the string that you want to get rid of. In which case I would recommend using a regex like so...
Pattern.compile("\\s+").matcher(text).replaceAll(" ").trim();
You can alway store the compiled regex for better performance.
Apache commons finally added this function: org.apache.commons.lang3.StringUtils.normalizeSpace(String str)
// docs
depends a little on exactly what it is you want to strip. If its certain specific characters then replaceAll() would be the go as posted by @Yaneeve. If the needs are more general then you might want to look at normalize the string using the Normalizer.
To remove the first and the last spaces you're looking for String#trim()
http://download.oracle.com/javase/1.4.2/docs/api/java/lang/String.html#trim()
If normalization means replacing sequences of spaces, tabs, newlines, and linefeeds, then I'd consider using a simple regular expression and String.split() to create separate words, then appending them in a StringBuilder with the spacing you'd like in between. If performance really matters, another approach would be to simply loop over the String's characters, looking at each one and deciding whether to append it to a StringBuilder or to discard it.
private static String normalize(String raw) {
StringBuilder sb = new StringBuilder();
Scanner scanner = new Scanner(raw);
while (scanner.hasNext()) {
sb.append(scanner.next());
sb.append(' ');
}
sb.deleteCharAt(sb.length() - 1);
return sb.toString();
}
精彩评论