Does anyone know why native2ascii generates lower-case hex codes, while Properties.store() produces upper-case hex?
Example:
保存 is encoded as \u4FDD\u5B58
when using Properties.store()
, but is encoded as \u4fdd\u5b58
when using native2ascii
Is there any way to co开发者_StackOverflowntrol this?
I don't know why but I do know it doesn't matter (to Java anyway, it may matter to you a great deal). Unicode escapes are allowed to have upper or lower case hex digits so it really doesn't matter to Java which one is used (even mixed case is valid).
The reason they're different is probably something as simple as they were written by two different people.
Is there any way to control it? Not easily from what I can see. It doesn't appear that native2ascii
has any options to control that output (it allows options to control the JVM but not to that level).
Properties.store()
uses an OutputStream
(and Properties.load()
uses an InputStream
) which you could probably subclass to filter the Unicode escapes but that seems an awful lot of work for (what looks like) dubious benefit.
Perhaps if you could tell us why you need this, there may be another way.
Update 1:
One thing that you could do is to pass the native2ascii
output through a filter which turns the Unicode escape sequeces into uppercase. The following code ucunicode.c
should be able to do this although I've only given it cursory testing. Simply execute:
native2ascii inputFile | ucunicode
and you should see the likes of \u00EF\u00BB\u00BF
instead of \u00ef\u00bb\u00bf
.
#include <stdio.h>
int main (void) {
int count = 0; // used for converting four hex digits after "\u".
int chminus2 = -1; // character from two passes ago.
int chminus1 = -1; // character from one pass ago.
int ch; // character for this pass.
// Standard filter loop.
while ((ch = getc (stdin)) != EOF) {
if (count-- > 0) {
// If processing Unicode escape sequence, uppercase letters.
putchar (((ch >= 'a') && (ch <= 'f')) ? ch - 'a' + 'A' : ch);
} else {
// Normal processing, detect escape sequence and flag it.
if ((chminus2 != '\\') && (chminus1 == '\\') && (ch == 'u')) {
count = 4;
}
// In any case, output the character.
putchar (ch);
}
// Shift characters "left".
chminus2 = chminus1;
chminus1 = ch;
}
return 0;
}
There may be edge cases that this doesn't handle well. I'm pretty certain it will handle all valid input but may break on invalid input like \u1\\u0000
but, since that means your native2ascii
is broken, you'll need to debug them yourself. This is a good start however.
Update 2:
Or, as a last-ditch solution, the OpenJDK project has the actual source files for native2ascii
in jdk\src\share\classes\sun\tools\native2ascii\
(and just about everything else that's not encumbered by copyright) which you could bring down and compile yourself (GPL2 applies). The files are Main.java
, A2NFilter.java
and N2AFilter.java
(and a couple of resource files). You'd simply have to change N2AFilter.java
to call:
String hex = Integer.toHexString(buf[i]).toUpperCase();
instead of just:
String hex = Integer.toHexString(buf[i]);
In fact, by examining that source code, you can see that Properties.store()
(in jdk/src/share/classes/java/util/Properties.java
) uses the following functions to create it's Unicode escapes:
private static final char[] hexDigit = {
'0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F'
};
private static char toHex (int nibble) {
return hexDigit[(nibble & 0xF)];
}
This explains why it generates upper case while native2ascii
produces lower case.
I just hit on this, and I have a reason this is annoying: I just converted a (unicode) .properties-file to another (homebrewed XML) format, with an export function back to .properties. This function uses Properties.store()
, while the original Unicode .properties was converted by ant via native2ascii
. I now wanted to compare they produce a similar result, so applied sort
on each and diff
on the result. Most of the resulting different lines are in fact due to these case differences in the \u-escapes. (I think I'll use a quick sed script to convert the case of one of the files.)
So, here is our sed script: s/\\u([0-9A-F]{4})/\\u\L\1\E/g
(changes to lowercase), or s/\\u([0-9A-F]{4})/\\u\U\1\E/g
(changes to uppercase). I had to change a bit more, since Properties.store()
also escaped more signs like !
to \!
, =
to \=
, :
to \:
.
精彩评论