Is the regular expression [a-Z] valid and if yes then is it the same as [a-zA-Z]?
Please note that in [a-Z] the a
is lowercase and the Z is uppercase.
Edit:
I received some answers specifiying that while [a-Z] is not valid then [A-z] is valid (but won't be the same as [a-zA-Z]) and this is really what I was looking for开发者_如何学运维. Since I wanted to know in general if it's possible to replace [a-zA-Z] with a more compact version.
Thanks for all who contributed to the answer.
No, a
(97) is higher than Z
(90). [a-Z]
isn't a valid character class. However [A-z]
wouldn't be equivalent either, but for a different reason. It would cover all the letters but would also include the characters between the uppercase and lowercase letters: [\]^_`
.
I'm not sure about other languages' implementations, but in PHP you can do
"/[a-z]/i"
and it will case insensitive. There is probably something similar for other languages.
You don't specify what language, but in general [a-Z]
won't be a valid range, as in ASCII the lower-case alpha characters come after the upper-case ones. [A-z]
might be a valid range (indicating all upper- and lower-cased alphas as well as the punctuation that appears between Z
and a
), but it might not be, depending on your particular implementation. The i
flag can be added to the regex to make it case-insensitive; check your particular implementation for instructions on how to specify that flag.
You could always try it:
print "ok" if "monkey" =~ /[a-Z]/;
Perl says
Invalid [] range "a-Z" in regex; marked by <-- HERE in m/[a-Z <-- HERE ]/ at a-z.pl line 4.
If it's valid, it won't do what you expect.
The character code of Z is lower than the character code of a, so if the codes are swapped to mean the range [Z-a]
, it will be the same as [Z\[\\\]^_`a]
, i.e. it will include the characters Z
and a
, and the characters between.
If you use [A-z]
to get all upper and lower case characters, that is still not the same as [A-Za-z]
, it's the same as [A-Z\[\\\]^_`a-z]
.
I've just fallen over this in a script (not my own).
It seems that grep, awk, sed accept [a-Z]
based on your locale (i.e. LANG or LC_CTYPE environment variable). In POSIX, [a-Z]
isn't allowed by these tools, but in some other locales (e.g. en_gb.utf8) it works, and is the same as [a-zA-Z]
.
Yes, I've checked, it doesn't match any of _^[]`
.
Given that this has taken quite some time to debug, I strongly discourage anyone from ever using [a-Z]
in a regex.
No, it's not valid, probably because the ASCII values are not consecutive from z to A.
精彩评论