开发者

Regex (java) help

开发者 https://www.devze.com 2022-12-14 12:45 出处:网络
How do I split this comma+quote delimited String into a set of strings: String test = \"[\\\"String 1\\\",\\\"String, two\\\"]\";

How do I split this comma+quote delimited String into a set of strings:

String test = "[\"String 1\",\"String, two\"]"; 
String[] embeddedStrings = test.split("<insert magic regex here>");
//note: It should also work for this string, with a space after the separating comma: "[\"String 1\", \"String, two\"]";    

assertEquals("String 1", embeddedStrings[0]);
assertEquals("String, two", embedde开发者_开发问答dStrings[1]);

I'm fine with trimming the square brackets as a first step. But the catch is, even if I do that, I can't just split on a comma because embedded strings can have commas in them. Using Apache StringUtils is also acceptable.


You could also use one of the many open source small libraries for parsing CSVs, e.g. opencsv or Commons CSV.


If you can remove [\" from the start of the outer string and \"] from the end of it to become:

      String test = "String 1\",\"String, two"; 

You can use:

     test.split("\",\"");


This is extremely fragile and should be avoided, but you could match the string literals.

Pattern p = Pattern.compile("\"((?:[^\"]+|\\\\\")*)\"");

String test = "[\"String 1\",\"String, two\"]";
Matcher m = p.matcher(test);
ArrayList<String> embeddedStrings = new ArrayList<String>();
while (m.find()) {
    embeddedStrings.add(m.group(1));
}

The regular expression assumes that double quotes in the input are escaped using \" and not "". The pattern would break if the input had an odd number of (unescaped) double quotes.


Brute-force method, some of this may be pseudocode and I think there's a fencepost problem when setting currStart and/or String.substring(). This assumes that brackets are already removed.

boolean inquote = false;
List strings = new ArrayList();
int currStart=0;
for (int i=0; i<test.length(); i++) {
  char c = test.charAt(i);
  if (c == ',' && ! inquote) {
    strings.add(test.substring(currStart, i);
    currStart = i;
  }
  else if (c == ' ' && currStart + == i)
    currStart = i; // strip off spaces after a comma
  else if (c == '"')
    inquote != inquote;
}
strings.add(test.substring(currStart,i));
String embeddedStrings = strings.toArray();
0

精彩评论

暂无评论...
验证码 换一张
取 消