开发者

Breaking a String into Chunks based on Pattern

开发者 https://www.devze.com 2023-01-29 03:12 出处:网络
I have one string, that looks like this: a[abcdefghi,2,3,jklmnopqr] The beginning \"a\" is fixed and non-changing, however the content within the brackets is and can follow a pattern. It will alway

I have one string, that looks like this:

a[abcdefghi,2,3,jklmnopqr]

The beginning "a" is fixed and non-changing, however the content within the brackets is and can follow a pattern. It will always be an alphabetical string, possibly followed by numbers separate by commas or more strings and/or numbers.

I'd like to be able to break it into chunks of the string and any numbers that follow it until the "]" or another string is met.

Probably best explained through examples and expected ideal results:

a[abcdefghi]               -> "abcdefghi"
a[abcdefghi,2]             -> "abcdefghi,2"
a[abcdefghi,2,3,jklmnopqr] -> "abcdefghi,2,3" and "jklmnopqr"
a[abcdefghi,2,3,jklmnopqr,stuvwxyz]     -> "abcdefghi,2,3" and "jklmnopqr" and "stuvwxyz"
a[abcdefghi,2,3,jklmnopqr,1,9,stuvwxyz] -> "abcdefghi,2,3" and "jklmnopqr,1,9" and "stuvwxyz"
a[abcdefghi,1,jklmnopqr,2,stuvwxyz,3,4] -> "abcdefghi,1" and "开发者_如何学运维jklmnopqr,2" and "stuvwxyz,3,4"

Ideally a malformed string would be partially caught (but this is a nice extra):

a[2,3,jklmnopqr,1,9,stuvwxyz] -> "jklmnopqr,1,9" and "stuvwxyz"

I'm using Javascript and I realize a regex won't bring me all the way to the solution I'd like but it could be a big help. The alternative is to do a lot of manually string parsing which I can do but doesn't seem like the best answer.

Advice, tips appreciated.

UPDATE: Yes I did mean alphametcial (A-Za-z) instead of alphanumeric. Edited to reflect that. Thanks for letting me know.


You'd probably want to do this in 2 steps. First, match against:

a\[([^[\]]*)\]

and extract group 1. That'll be the stuff in the square brackets.

Next, repeatedly match against:

[a-z]+(,[0-9]+)*

That'll match things like "abcdefghi,2,3". After the first match you'll need to see if the next character is a comma and if so skip over it. (BTW: if you really meant alphanumeric rather than alphabetic like your examples, use [a-z0-9]*[a-z][a-z0-9]* instead of [a-z]+.)

Alternatively, split the string on commas and reassemble into your word with number groups.


Why wouldn't a regex bring you all the way to a solution? The following regex works against the given data, but it makes a few assumptions (at least two alphas followed by comma separated single digits).

([a-z]{2,}(?:,\\d)*)

Example:

re = new RegExp('[a-z]{2,}(?:,\\d)*', 'g') 
matches = re.exec("a[abcdefghi,2,3,jklmnopqr,1,9,stuvwxyz]")


Assuming you can easily break out the string between the brackets, something like this might be what you're after:

> re = new RegExp('[a-z]+(?:,\\d)*(?:,?)', 'gi')
> while (match = re.exec("abcdefghi,2,3,jklmnopqr,1,9,stuvwxyz")) { print(match[0]) }
abcdefghi,2,3,
jklmnopqr,1,9,
stuvwxyz

This has the advantage of working partially in your malformed case:

> while (match = re.exec("abcdefghi,2,3,jklmnopqr,1,9,stuvwxyz")) { print(match[0]) }
jklmnopqr,1,9,
stuvwxy

The first character class [a-z] can be modified if you meant for it to be truly alphanumeric.

0

精彩评论

暂无评论...
验证码 换一张
取 消