I ran into an issue which I can't figure out. Here is the definition of the problem: I have some data in a Blob column in Db2/Linux environment. Blob was written into DB2 after the byte[] was compressed using JDK compression (code that does this is running in Linux environment). I am trying to write a simple program to read some of this data decompress it (using JDK) and create a String from the decompressed byte array in Windows Environment (my development environment). Issue is that after I decompress the Blob (byte[]), length of the decompressed byte array is usually 1-3 bytes longer than expected. What I mean by expected is that the offset and length fields are als开发者_StackOverflow中文版o being stored in the database. So in this case, length of the decompressed byte array is usually longer than the stored length in database, just a few bytes. So if I create a String object from the decompressed byte array and create another String object using the substring(offset, length) method using the offset and length fields from the database, my second String(the one I got by using substring method) is shorter.
An example would be: database record contains a blob, offset: 0, length: 260,409 after decompressing the blob -
compressedByte[].length - 71,212
decompressedByte[].length - 260,412
new String(decompressByte[]).length() - 260,412
new String(decompressByte[]).subString(0, 260,409).length() - 260409
For some other input records, the difference I am seeing is anywhere between 1-3 bytes in length.
I am sort of puzzled with this issue and wondering if anyone could suggest any tips so I can do more debugging to figure this issue out. I am wondering whether this could be somehow related to how bytes are being stored/written in Linux environment and how they are being read in Windows? Thanks for your help.
I suspect the default encoding is different between the two systems.
// on the linux box
byte [] blob = str.getBytes("UTF-8");
// in your code
String str = new String(blob, "UTF-8");
Or at the least find out what the default encoding is on the linux box is (normal UTF-8) and skip step 1.
A really good examplation of what could be happening here is on Joel on software
A String
is not a general holder for bytes. You will undoubtedly have different default character encodings between your db2/Linux environment and your Windows environment which will be causing the conversion back and forth between bytes and characters to be different.
精彩评论