Using Java (1.6) I want to split an input string that has components of a header, then a number of tokens. Tokens conform to this format: a ! char, a space char, then a 2 char token name (from constrained list e.g. C0 or 04) and then 5 digits. I have built a pattern 开发者_如何学Gofor this, but it fails for one token (CE) unless I remove the requirement for the 5 digits after the token name. Unit test explains this better than I could (see below)
Can anyone help with what's going on with my failing pattern? The input CE token looks OK to me...
Cheers!
@Test
public void testInputSplitAnomaly() {
Pattern pattern = Pattern.compile("(?=(! [04|C0|Q2|Q6|C4|B[2-6]|Q[8-9]|C6|CE]\\d{5}))");
splitByRegExp(pattern);
}
@Test
public void testInputSplitWorks() {
Pattern pattern = Pattern.compile("(?=(! [04|C0|Q2|Q6|C4|B[2-6]|Q[8-9]|C6|CE]))");
splitByRegExp(pattern);
}
public void splitByRegExp(Pattern pattern) {
String input = "& 0000800429! C600080 123456789-! C000026 213 00300! 0400020 A1Y1! Q200002 13! CE00202 01 ! Q600006 020507! C400012 O00511011";
String[] tokens = pattern.split(input);
Arrays.sort(tokens);
System.out.println("-----------------------------");
for (String token : tokens) {
System.out.println(token.substring(0,11));
}
assertThat(tokens,Matchers.hasItemInArray(startsWith("! CE")));
assertThat(tokens.length,is(8));
}
I think that your mistake here is your use of square brackets. Don't forget that these indicate a character class, so [04|C0|Q2|Q6|C4|B[2-6]|Q[8-9]|C6|CE]
doesn't do what you expect it to.
What it does do is the following:
[04|C0|Q2|Q6|C4|B[2-6]
constitutes a character class, matching one of:|
,[
,0
,2
,3
,4
,5
,6
,B
,C
orQ
,- the rest is interpreted as listing a set of alternatives, specificially the character class mentioned above, or
Q[8-9]
*or *C6
*or *CE]
. That is why theCE
doesn't work, because it does not have a square bracket with it.
What you are probably after is (?:04|C0|Q2|Q6|C4|B[2-6]|Q[8-9]|C6|CE)
This doesn't make any sense:
[04|C0|Q2|Q6|C4|B[2-6]|Q[8-9]|C6|CE]
I believe you want:
(?:04|C0|Q2|Q6|C4|B[2-6]|Q[8-9]|C6|CE)
Square brackets are only used for character classes, not general grouping. Use (?:...)
or (...)
for general grouping (the latter also captures).
精彩评论