Censoring selected words (replacing them with ****) using a single replaceAll?_问答_开发者

Censoring selected words (replacing them with ****) using a single replaceAll?

开发者 https://www.devze.com 2023-01-01 21:44 出处：网络

I\'d like to censor some words in a string by replacing each character in the word with a \"*\". Basically I would want to do

相关专题：regex

I'd like to censor some words in a string by replacing each character in the word with a "*". Basically I would want to do

String s = "lorem ipsum dolor sit";
s = s.replaceAll("ipsum|sit", $0.length() number of *));

so that the resulting s equals "lorem ***** dolor ***".

I know how to do this with repeated replaceAll invokations, but I'm wondering, is this possible to do with a single replaceAll?

Update: It's a part of a research case-study and the reason is basically that I would like to get away with a one-liner as it simplifies the generated bytecode a bit. It's not for开发者_StackOverflow a serious webpage or anything.

Here's a modification to aioobe's answer, using nested assertions instead of nested loop to generate the assertions:

public static void main(String... args) {
    String s = "lorem ipsum dolor sit blah $10 bleh";
    System.out.println(s.replaceAll(censorWords("ipsum", "sit", "$10"), "*"));
    // prints "lorem ***** dolor *** blah *** bleh"
}
public static String censorWords(String... words) {
    StringBuilder sb = new StringBuilder();
    for (String w : words) {
        if (sb.length() > 0) sb.append("|");
        sb.append(
           String.format("(?<=(?=%s).{0,%d}).",
              Pattern.quote(w),
              w.length()-1
           )
        );
    }
    return sb.toString();
}

Some key points:

StringBuilder.append in a loop instead of String +=
Pattern.quote to escape any $ or \ in censored words

That said, this is not the best solution to the problem. It's just a fun regex game to play, really.

How it works

We want to replace with "*", so we have to match one character at a time. The question is which character.

It's the character where if you go back long enough, and then you look forward, you see a censored word.

Here's the regex in more abstract form:

(?<=(?=something).{0,N})

This matches positions where, allowing you to go back up to N characters, you can lookahead and see something.

It's possible using zero-width lookarounds:

public class Test {
    public static void main(String... args) {
        String s = "lorem ipsum dolor sit";
        System.out.println(s.replaceAll(censorWords("ipsum", "sit"), "*"));
    }

    public static String censorWords(String... words) {
        String re = "";
        for (String w : words)
            for (int i = 0; i < w.length(); i++)
                re += String.format("|((?<=%s)%s(?=%s))",
                        w.substring(0, i), w.charAt(i), w.substring(i + 1));
        return re.substring(1);
    }
}

Prints

lorem ***** dolor ***

The generated regular expression isn't pretty but it does the trick :-)

This is not a good way to censor text. Jeff Atwood has a great post about censoring in this way.

http://www.codinghorror.com/blog/2008/10/obscenity-filters-bad-idea-or-incredibly-intercoursing-bad-idea.html

Unless you are going to spend lots and lots of time on this censoring feature it will probably end up censoring things that shouldn't be.

Another note:
Making the Java code into a 1-liner will not necessarily simplify the bytecode. Using that logic, you could throw your censoring code into a single method and then just use that.

Java's replace method doesn't take a callback as argument; so it isn't easy. But since profanity filters are mostly used on the web, I assume you can use JavaScript for that.

var s = "this is some sample text to play with";
var r = s.replace(/\b(some|sample|to)\b/g, function() {
  var star = "*";
  var len = arguments[1].length;
  while(--len)
    star += "*";
  return star;
});
console.log(r);//this is **** ****** text ** play with