lookahead and group_问答_开发者_运维开发者技术经验分享

开发者 https://www.devze.com 2023-01-01 02:06 出处：网络

In Java, on a text like foo <on> ba开发者_运维百科r </on> thing <on> again</on> now, I should want a regex with groups wich give me with a find \"foo\", \"bar\", empty string,

In Java, on a text like foo <on> ba开发者_运维百科r </on> thing <on> again</on> now, I should want a regex with groups wich give me with a find "foo", "bar", empty string, then "thing", "again", "now".

If I do (.*?)<on>(.*?)</on>(?!<on>), I get only two group (foo bar, thing again, and I've not the end "now").

if I do (.*?)<on>(.*?)</on>((?!<on>)) I get foo bar empty string, then thing again and empty string (here I should want "now").

Please what is the magical formula ?

Thanks.

If you insist on doing this with regex, then you can try to use \s*<[^>]*>\s* as delimiter:

    String text = "foo <on> bar </on> thing <on> again</on> now";
    String[] parts = text.split("\\s*<[^>]*>\\s*");
    System.out.println(java.util.Arrays.toString(parts));
    // "[foo, bar, thing, again, now]"

I'm not sure if this is exactly what you need, because it's not exactly clear.

Perhaps something like this was required:

    String text = "1<on>2</on>3<X>4</X>5<X>6</X>7<on>8</on><X>9</X>10";
    String[] parts = text.split("\\s*</?on>\\s*|<[^>]*>[^>]*>");
    System.out.println(java.util.Arrays.toString(parts));
    // prints "[1, 2, 3, 5, 7, 8, , 10]"

This doesn't handle nested tags. If you have those, you'd really want to dump regex and use an actual HTML parser.

If you don't want the empty string in the middle of the array, then just (?:delimiter)+.

    String text = "1<on>2</on>3<X>4</X>5<X>6</X>7<on>8</on><X>9</X>10";
    String[] parts = text.split("(?:\\s*</?on>\\s*|<[^>]*>[^>]*>)+");
    System.out.println(java.util.Arrays.toString(parts));
    // prints "[1, 2, 3, 5, 7, 8, 10]"

My recommendations

there is no need to match text before <on> and after </on>
use non greedy flags to match text between <on> and next </on>
use a loop with Matcher.find() to sequence through all occurences, if possible. No need to do all at once with one big fat regexp!

lookahead and group

精彩评论

关注公众号

热门标签

图文推荐

lookahead and group

更多 问答 相关资讯：

精彩评论

关注公众号

热门标签

图文推荐

更多问答相关资讯：