I find my self constantly changing my utility methods to improve string handling in my Java code. For example I have changed bunch of my search and replace code to use commons StringUtils.replace method. Or upgrading 1.4 to 1.6 java to type safe code.
I would like to ask you to post practices you employ in order to have your string operations run smoothly, securely, fast and the re-usability of your code is fairly simple and elegant.
And if you have pattern to it, even better.
Also if you now about any new 1.7 java features that are worth the while please post them here.
What to do when working with very large strings? Break them up?
What to keep in mind when using regex on strings?
How to utilize cache patterns (and which are the best once) when working with loop intensive algorithms?
Are there libraries that have similar functions as grep, ack, diff | spell check | filter curse words (any words) ...
When possible you should use StringBuilder , especially for concatenating strings. The performance improvement can be very large.
StringBuilder sb = new StringBuilder("Mat");
sb.append(" ");
sb.append("Bank");
// oops
int i = sb.indexOf("k");
sb.insert(i, 'i'); // character
String mb = sb.toString();
// result = "Mat Banik"
In a large program the use of s1+s2 is one of the worst and simplest performance hits to cure.
StringBuilder has almost all the features of String. You can extract substrings without copying. When you need a String (e.g. for Pattern/Matcher) you can convert with toString().
Security-wise, make sure to escape properly when you combine strings of different content types, for example, when concatenating a plain text string with a string of HTML to produce a string of HTML.
http://commons.apache.org/lang/api-2.5/org/apache/commons/lang/StringEscapeUtils.html has a bunch of useful escaping functions and their inverses.
For example, to avoid XSS attacks, you can encode output like:
void doGet(HttpServletRequest req, HttpServletResponse resp) {
String message = req.getParameter("message");
// Unless I check, in code, that an input is of some other content type,
// I need to conservatively assume it's plain text.
...
resp.setContentType("text/html;charset=UTF-8");
...
// Since resp is a channel with content-type text/html,
// I need to only write HTML to it.
resp.getWriter().write(
"<h2>" // This is already HTML.
+ StringEscapeUtils.escapeHtml(message) // plain text -> innocuous HTML
+ "</h2>" // Also already HTML.
);
...
}
Remember that String.substring() only creates a new "view" into the original char array. So, when taking a small substring of a large String and storing it, it's a good idea to create a new String of just the substring. This avoids a hard to track down memory leak where your substrings really hold the entire original string.
Beware though that IBM's and Sun's JVMs String constructor implementations behave differently. Sun's does what you expect - it creates a new char array. IBM's however does not, you have to get the char array first. To avoid this, I had to do the following ugly bit of code:
private static final boolean IS_IBM_JVM = System.getProperty("java.vm.vendor").startsWith("IBM");
...
if (IS_IBM_JVM) {
substring = new String(substring.toCharArray());
}
else {
substring = new String(substring);
}
精彩评论