开发者

Regular expression, value in between quotes

开发者 https://www.devze.com 2023-02-19 22:05 出处:网络
I\'m having a little trouble c开发者_StackOverflow社区onstructing the regular expression using java.

I'm having a little trouble c开发者_StackOverflow社区onstructing the regular expression using java.

The constraint is, I need to split a string seperated by !. The two strings will be enclosed in double quotes. For example:

"value"!"value"

If I performed a java split() on the string above, I want to get:

value
value

However the catch is value can be any characters/punctuations/numerical character/spaces/etc..

So here's a more concrete example. Input:

""he! "l0"!"wor!"d1"

Java's split() should return:

"he! "l0
wor!"d1

Any help is much appreciated. Thanks!


Try this expression: (".*")\s*!\s*(".*")

Although it would not work with split, it should work with Pattern and Matcher and return the 2 strings as groups.

String input = "\"  \"he\"\"\"\"! \"l0\" ! \"wor!\"d1\"";
Pattern p = Pattern.compile("(\".*\")\\s*!\\s*(\".*\")");
Matcher m = p.matcher(input);
if(m.matches())
{
  String s1 = m.group(1); //"  "he""""! "l0"
  String s2 = m.group(2); //"wor!"d1"
}

Edit:

This would not work for all cases, e.g. "he"!"llo" ! "w" ! "orld" would get the wrong groups. In that case it would be really hard to determine which ! should be the separator. That's why often rarely used characters are used to separate parts of a string, like @ in email addresses :)


have the value split on "!" instead of !

String REGEX = "\"!\"";

String INPUT = "\"\"he! \"l0\"!\"wor!\"d1\"";

String[] items = p.split(INPUT);


It feels like you need to parse on:

DOUBLEQUOTE = "
OTHER = anything that isn't a double quote
EXCLAMATION = !
ITEM = (DOUBLEQUOTE (OTHER | (DOUBLEQUOTE OTHER DOUBLEQUOTE))* DOUBLEQUOTE
LINE = ITEM (EXCLAMATION ITEM)*

It feels like it's possible to create a regular expression for the above (assuming the double quotes in an ITEM can't be nested even further) BUT it might be better served by a very simple grammer.

This might work... excusing missing escapes and the like

^"([^"]*|"[^"]*")*"(!"([^"]*|"[^"]*")*")*$

Another option would be to match against the first part, then, if there's a !and more, prune off the ! and keep matching (excuse the no-particular-language, I'm just trying to illustrate the idea):

resultList = []
while(string matches \^"([^"]*|"[^"]*")*(.*)$" => match(1)) {
    resultList += match
    string = match(2)
    if(string.beginsWith("!")) {
        string = string[1:end]
    } elseif(string.length > 0) {
        // throw an error, since there was no exclamation and the string isn't done
    }
}
if(string.length > 0) {
    // throw an exception since the string isn't done
}
resultsList == the list of items in the string

EDIT: I realized that my answer doesn't really work. You can have a single doublequote inside the strings, as well as exclamation marks. As such, you really CAN'T have "!" inside one of the strings. As such, the idea of 1) pull quotes off the ends, 2) split on '"!"' is really the right way to go.

0

精彩评论

暂无评论...
验证码 换一张
取 消