开发者

Escape comma when using String.split

开发者 https://www.devze.com 2023-02-10 07:16 出处:网络
I\'m trying to perform some super simple parsing o log files, so I\'m using String.split method like this:

I'm trying to perform some super simple parsing o log files, so I'm using String.split method like this:

String [] parts = input.split(",");

And works great for input like:

a,b,c

Or

type=simple, output=Hello, repeat=true 

Just to say something.

How can I escape the comma, so it doesn't match intermediate commas?

For instance, if I want to include a comma in one of the parts开发者_StackOverflow社区:

type=simple, output=Hello, world, repeate=true

I was thinking in something like:

type=simple, output=Hello\, world, repeate=true

But I don't know how to create the split to avoid matching the comma.

I've tried:

String [] parts = input.split("[^\,],");

But, well, is not working.


You can solve it using a negative look behind.

String[] parts = str.split("(?<!\\\\), ");

Basically it says, split on each ", " that is not preceeded by a backslash.

String str = "type=simple, output=Hello\\, world, repeate=true";
String[] parts = str.split("(?<!\\\\), ");
for (String s : parts)
    System.out.println(s);

Output:

type=simple
output=Hello\, world
repeate=true

(ideone.com link)


If you happen to be stuck with the non-escaped comma-separated values, you could do the following (similar) hack:

String[] parts = str.split(", (?=\\w+=)");

Which says split on each ", " which is followed by some word-characters and an =

(ideone.com link)


I'm afraid, there's no perfect solution for String.split. Using a matcher for the three parts would work. In case the number of parts is not constant, I'd recommend a loop with matcher.find. Something like this maybe

final String s = "type=simple, output=Hello, world, repeat=true";
final Pattern p = Pattern.compile("((?:[^\\\\,]|\\\\.)*)(?:,|$)");
final Matcher m = p.matcher(s);
while (m.find()) System.out.println(m.group(1));

You'll probably want to skip the spaces after the comma as well:

final Pattern p = Pattern.compile("((?:[^\\\\,]|\\\\.)*)(?:,\\s*|$)");

It's not really complicated, just note that you need four backslashes in order to match one.


Escaping works with the opposite of aioobe's answer (updated: aioobe now uses the same construct but I didn't know that when I wrote this), negative lookbehind

final String s = "type=simple, output=Hello\\, world, repeate=true";
final String[] tokens = s.split("(?<!\\\\),\\s*");
for(final String item : tokens){
    System.out.println("'" + item.replace("\\,", ",") + "'");
}

Output:

'type=simple'
'output=Hello, world'
'repeate=true'

Reference:

  • Pattern: Special Constructs


I think

input.split("[^\\\\],");

should work. It will split at all commas that are not preceeded with a backslash. BTW if you are working with Eclipse, I can recommend the QuickRex Plugin to test and debug Regexes.

0

精彩评论

暂无评论...
验证码 换一张
取 消