开发者

How to split a string by ',' or '[|]' unless the ',' is in '{}'

开发者 https://www.devze.com 2022-12-24 11:38 出处:网络
i search for a regex to split the following string: aaa[bbb,ccc[ddd,{eee:1,mmm:999}],nnn[0,3]] aaa[bbb,ccc[ddd,{eee:1, mmm:[123,555]}],nnn[0,3]]

i search for a regex to split the following string:

aaa[bbb,ccc[ddd,{eee:1,mmm:999}],nnn[0,3]]
aaa[bbb,ccc[ddd,{eee:1, mmm:[123,555]}],nnn[0,3]]
aaa[bbb, ccc[ddd, ddd],nnn[0,3]]
aaa[bbb,ddd[0,3]]

by '[' or ']' or ',' unless the ',' is in '{}'. As example: split 'aaa[bbb,ccc[ddd,' to aaa, bbb, c开发者_如何学Gocc, ddd is allow but not {eee:1,mmm:999}.

the result:

aaa, bbb, ccc, ddd, {eee:1,mmm:999}, nnn, 0, 3
aaa, bbb, ccc, ddd, {eee:1, mmm:[123,555]}], nnn, 0, 3
aaa, bbb, ccc, ddd, ddd, nnn, 0, 3
aaa, bbb, ddd, 0, 3

i have read meany other questions but i cant modifie the regex's there are post to do this what i want.

the target language for the expression is javascript.


It is not possible to do this using regular expressions and handle unlimited nested braces; you need a stack-based parser.


Perl/PCRE regex, should work in JS too (as long as {} aren't nested):

$_ = 'aaa[bbb,ccc[ddd,{eee:1,mmm:999}],nnn[0,3]]
aaa[bbb,ccc[ddd,{eee:1, mmm:[123,555]}],nnn[0,3]]
aaa[bbb, ccc[ddd, ddd],nnn[0,3]]
aaa[bbb,ddd[0,3]]';

@r = /[^][,{}]+|\{[^}]*}/g;
print join ", ", @r;

Output:

aaa, bbb, ccc, ddd, {eee:1,mmm:999}, nnn, 0, 3,
aaa, bbb, ccc, ddd, {eee:1, mmm:[123,555]}, nnn, 0, 3,
aaa, bbb,  ccc, ddd,  ddd, nnn, 0, 3,
aaa, bbb, ddd, 0, 3

A rough translation into JavaScript:

var input =
    "aaa[bbb,ccc[ddd,{eee:1,mmm:999}],nnn[0,3]]\n" +
    "aaa[bbb,ccc[ddd,{eee:1, mmm:[123,555]}],nnn[0,3]]\n" +
    "aaa[bbb, ccc[ddd, ddd],nnn[0,3]]\n" +
    "aaa[bbb,ddd[0,3]]";

var re = /[^][,{}]+|\{[^}]*}/g;

var result = [];
while (!!(match = re.exec(input)))
{
    result.push(match[0]);
}

// Using <<value>> rather than just a comma, for clarity around
// whether and how "{...}" was processed or not.
write("<<" + result.join(">><<") + ">>");

It's not clear what the line breaks in the input or result data in the question are meant to be. In the above, they're line breaks in the input data and then not treated specially in the result. If they need to be treated specially, the OP can edit appropriately. And so this is the result of the above (again, using << and >> as separators rather than , for clarity around whether {...} gets processed):

<<aaa>><<bbb>><<ccc>><<ddd>><<{eee:1,mmm:999}>><<nnn>><<0>><<3>><<
aaa>><<bbb>><<ccc>><<ddd>><<{eee:1, mmm:[123,555]}>><<nnn>><<0>><<3>><<
aaa>><<bbb>><< ccc>><<ddd>><< ddd>><<nnn>><<0>><<3>><<
aaa>><<bbb>><<ddd>><<0>><<3>>


A non-regex way would be to just write a loop that checks the string character by character. When it encounters a {, increment a variable. When it encounters a }, de-increment a variable. When it encounters a , and the variable you were incrementing/de-incrementing was at zero, add the position of the , to a list. When you're done, you have the list of positions where you want to split the string.

I'm assuming that there aren't any closing braces } which occur before opening braces {, otherwise you might want to ignore the misplaced closing braces rather than de-incrementing your variable into the negatives.


Separate the {stuff} while you split the rest-

function customRx(s){
 s= s.replace(/[\[\],\s]+$/g,'');
 var Rx=/,?(\{[^}]+\}),?/g, Rs=/[\[\],\s]+/, Rc=/^,|,$/g;
 var A= [], i= 0, M, z= 0;
 while((M= Rx.exec(s))!= null){
  i= M.index;
  if(i> z){
   A.push(s.substring(z, i).split(Rs));
  }
  z= Rx.lastIndex;
  A.push(s.substring(i, z).replace(Rc,''));
 }
 if(s.length> z){
  A.push(s.substring(z).split(Rs));
 }
 return A;
}

// test

var s1= 'aaa[bbb,ccc[ddd,{eee:1,mmm:999}],nnn[0,3]]'+
'aaa[bbb,ccc[ddd,{eee:1, mmm:[123,555]}],nnn[0,3]]'+
'aaa[bbb, ccc[ddd, ddd],nnn[0,3]]'+
'aaa[bbb,ddd[0,3]]';

alert(customRx(s1).join(', '));

returned value (newlines added)>

aaa,bbb,ccc,ddd, {eee:1,mmm:999},

nnn,0,3,aaa,bbb,ccc,ddd, {eee:1, mmm:[123,555]},

nnn,0,3,aaa,bbb,ccc,ddd,ddd,nnn,

0,3,aaa,bbb,ddd,0,3


Assuming you're processing the text line by line, and that braces can't be nested, this split regex should work:

/ *[\[\],]+ *(?=[^{}]*(?:\{[^{}]*\}[^{}]*)*$)/

The first part -- *[\[\],]+ * -- matches one or more of [, ] or , and any surrounding spaces. The rest is a lookahead that asserts that, if there are any braces ahead of the matched characters, they come in balanced pairs. If the text is well formed, that ensures that a match won't occur inside a pair of braces.

0

精彩评论

暂无评论...
验证码 换一张
取 消