开发者

Matcher.Find() returns false when it should be true

开发者 https://www.devze.com 2023-02-25 14:02 出处:网络
String s = \"test\"; Pattern pattern = Pattern.compile(\"\\\\n((\\\\w+\\\\s*[^\\\\n]){0,2})(\\\\b\" + s + \"\\\\b\\\\s)((\\\\w+\\\\s*){0,2})\\\\n?\");
        String s = "test";
        Pattern pattern = Pattern.compile("\\n((\\w+\\s*[^\\n]){0,2})(\\b" + s + "\\b\\s)((\\w+\\s*){0,2})\\n?");
        Matcher matcher = pattern.matcher(searchableText);
        boolean topicTitleFound = matcher.find();
        startIndex = 0;
        while (topicTitleFound) {
            int i = searchableText.indexOf(matcher.group(0));
            if (i > startIndex) {
                builder.append(documentText.substring(startIndex, i - 1));
        ...

This is the text that I tacle:

Some text comes here

topicTitle test :

test1 : testing123

test2 : testing456

test3 : testing789

test4 : testing9097

When I'm testing this regex on http://regexpal.com/ or http://www.regexplan开发者_Python百科et.com I clearly find the title that is saying: "topicTitle test". But in my java code topicTitleFound returns false.

Please help


It could be that you have carriage-return characters ('\r') before the newline characters ('\n') in your searchableText. This would cause the match to fail at line boundaries.

To make your multi-line pattern more robust, try using the MULTILINE option when compiling the regex. Then use ^ and $ as needed to match line boundaries.

Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);

Update:

After actually testing out your code, I see that the pattern matches whether carriage-returns are present or not. In other words, your code "works" as-is, and topicTitleFound is true when it is first assigned (outside the while loop).

Are you sure that you are getting false for topicTitleFound? Or is the problem in the loop?

By the way, the use of indexOf() is wasteful and awkward, since the matcher already stores the index at which group 0 begins. Use this instead:

int i = matcher.start(0);


Your regex is a bit hard to decrypt - not really obvious what you're trying to do. One thing that springs to mind is that your regex expects the match to start with a newline, and your sample text doesn't.

0

精彩评论

暂无评论...
验证码 换一张
取 消