开发者

Apply a function to the replacement string when matching to a regex in java

开发者 https://www.devze.com 2023-02-13 18:47 出处:网络
I would like to replace some patterns, in a开发者_Go百科 String, by the call of a function upon the detected groups.

I would like to replace some patterns, in a开发者_Go百科 String, by the call of a function upon the detected groups.

More specifically, I would like for example to transform

String input = "normal <upper> normal <upper again> normal";

into

String output = "normal UPPER normal UPPER AGAIN normal";

The regex \<(.*?)\>" should detect the pattern i want to transform, but using

output = input.replaceAll("\\<(.*?)\\>", "$1".toUpperCase());

doesn't work, because logically it puts $1 to upper case, that is, nothing happens, before treating it inside the method.

Besides, the method I want to apply is to be called with the replacement string as an argument ; thus the "wrong naive way" would be something more like

output = input.replaceAll("\\<(.*?)\\>", transform("$1"));

Do you know of any trick to do this?


Idiomatic way to do it is slightly verbose:

Matcher m = Pattern.compile("\\<(.*?)\\>").matcher(input);
StringBuffer b = new StringBuffer();
while (m.find()) {
    m.appendReplacement(b, transform(m.group());
}
m.appendTail(b);
output = b.toString();


An example is shown here

Its sad Java makes you create a separate buffer and loop building it up with m.find() only to have to re-assign it to the input string.

In perl its done in-line within the engine: $str =~ s/<(.*?)>/'<'.upper($1).'>'/seg; but thats just perl, a mystery unto itself.


Pattern p = Pattern.compile("<([^<>]+)>");
Matcher m = p.matcher(input);
StringBuffer sb = new StringBuffer();
while (m.find()) {
    m.appendReplacement(sb, "");
    sb.append(transform(m.group(1));
}
m.appendTail(sb);
output = sb.toString();

The main improvement over @axtavt's answer is the two-stage append process.

appendReplacement() processes the replacement string looking for dollar signs (which indicate group references) and backslashes (which are used to escape dollar signs and backslashes). But any dollar signs in our replacement strings should be treated literally; treating them as group-reference sigils will result in garbage output or runtime exceptions. So we disable it by passing an empty string to appendReplacement() and appending the actual replacement to the StringBuffer ourselves.

Note: the quoteReplacment() method, which I mentioned in a comment to another answer, would do the job too. This approach is possible because we're looping manually, not calling replaceAll() or replaceFirst(), and it's both clearer (IMO) and more efficient.

0

精彩评论

暂无评论...
验证码 换一张
取 消