开发者

Java Regex check previous char before splitting

开发者 https://www.devze.com 2023-02-03 08:13 出处:网络
I have a string like this This:string:must~:be:split:when:previous:char:is:not~:this I need to split the line with the delimiter \":\" but only if the character before the delimiter is NOT \"~\"

I have a string like this

This:string:must~:be:split:when:previous:char:is:not~:this

I need to split the line with the delimiter ":" but only if the character before the delimiter is NOT "~"

I have the following regex now:

String[] split = str.split(":(?<!~:)");

It works, but since I arrived at it purely by trial and error, I'm not convinced that its the most efficie开发者_开发问答nt way of doing it. Also, this function will be repeatedly called on large strings frequently, so performance does come into consideration. What is a more efficient way of doing it?


A slightly simpler approach is this:

(?<!~):

That way you don't match : twice. I doubt you'll see any difference in performances though. It is also very simple to write without a regular expression by simply looking for the next colon, and checking for tilde before it.


Update: To make this more fair I wanted to use a compiled Pattern and see the results of that. So I updated the code to use compiled pattern, non-compiled pattern and my custom method.

While this isn't using regex it proves to be faster then the regex given.

public static void main(String[] args) {
    Pattern pattern = Pattern.compile(":(?<!~:)");
    for (int runs = 0; runs < 4; ++runs) {
        long start = System.currentTimeMillis();
        for (int index = 0; index < 100000; ++index) {
            "This:string:must~:be:split:when:previous:char:is:not~:this".split(":(?<!~:)");
        }
        long stop = System.currentTimeMillis();
        System.out.println("Run: " + runs + " Regex: " + (stop - start));

        start = System.currentTimeMillis();
        for (int index = 0; index < 100000; ++index) {
            pattern.split("This:string:must~:be:split:when:previous:char:is:not~:this");
        }
        stop = System.currentTimeMillis();
        System.out.println("Run: " + runs + " Compiled Regex: " + (stop - start));

        start = System.currentTimeMillis();
        for (int index = 0; index < 100000; ++index) {
            specialSplit("This:string:must~:be:split:when:previous:char:is:not~:this");
        }
        stop = System.currentTimeMillis();
        System.out.println("Run: " + runs + " Custom: " + (stop - start));
    }

    for (String s : specialSplit("This:string:must~:be:split:when:previous:char:is:not~:this")) {
        System.out.println(s);
    }
}

public static String[] specialSplit(String text) {
    List<String> stringsAfterSplit = new ArrayList<String>();

    StringBuilder splitString = new StringBuilder();
    char previousChar = 0;
    for (int index = 0; index < text.length(); ++index) {
        char charAtIndex = text.charAt(index);
        if (charAtIndex == ':' && previousChar != '~') {
             stringsAfterSplit.add(splitString.toString());
             splitString.delete(0, splitString.length());
        } else {
                splitString.append(charAtIndex);
        }
            previousChar = charAtIndex;
    }
    if (splitString.length() > 0) {
        stringsAfterSplit.add(splitString.toString());
    }
    return stringsAfterSplit.toArray(new String[stringsAfterSplit.size()]);
}

Output

Run: 0 Regex: 468
Run: 0 Compiled Regex: 365
Run: 0 Custom: 169
Run: 1 Regex: 437
Run: 1 Compiled Regex: 363
Run: 1 Custom: 166
Run: 2 Regex: 445
Run: 2 Compiled Regex: 363
Run: 2 Custom: 167
Run: 3 Regex: 436
Run: 3 Compiled Regex: 361
Run: 3 Custom: 167
This
string
must~:be
split
when
previous
char
is
not~:this


Try this one. [^~]:

Tested in JS

0

精彩评论

暂无评论...
验证码 换一张
取 消