Problem Inflating byte[] in Java?_问答_开发者_运维开发者技术经验分享

开发者 https://www.devze.com 2023-02-03 01:39 出处：网络

I ran into an issue which I can\'t figure out. Here is the definition of the problem: I have some data in a Blob column in Db2/Linux environment. Blob was written into DB2 after the byte[] was compres

I ran into an issue which I can't figure out. Here is the definition of the problem: I have some data in a Blob column in Db2/Linux environment. Blob was written into DB2 after the byte[] was compressed using JDK compression (code that does this is running in Linux environment). I am trying to write a simple program to read some of this data decompress it (using JDK) and create a String from the decompressed byte array in Windows Environment (my development environment). Issue is that after I decompress the Blob (byte[]), length of the decompressed byte array is usually 1-3 bytes longer than expected. What I mean by expected is that the offset and length fields are als开发者_StackOverflow中文版o being stored in the database. So in this case, length of the decompressed byte array is usually longer than the stored length in database, just a few bytes. So if I create a String object from the decompressed byte array and create another String object using the substring(offset, length) method using the offset and length fields from the database, my second String(the one I got by using substring method) is shorter.

An example would be: database record contains a blob, offset: 0, length: 260,409 after decompressing the blob -

 compressedByte[].length  - 71,212
 decompressedByte[].length   - 260,412
 new String(decompressByte[]).length()  - 260,412
 new String(decompressByte[]).subString(0, 260,409).length() - 260409

For some other input records, the difference I am seeing is anywhere between 1-3 bytes in length.

I am sort of puzzled with this issue and wondering if anyone could suggest any tips so I can do more debugging to figure this issue out. I am wondering whether this could be somehow related to how bytes are being stored/written in Linux environment and how they are being read in Windows? Thanks for your help.

I suspect the default encoding is different between the two systems.

// on the linux box   
byte [] blob = str.getBytes("UTF-8");

// in your code 
String str = new String(blob, "UTF-8");

Or at the least find out what the default encoding is on the linux box is (normal UTF-8) and skip step 1.

A really good examplation of what could be happening here is on Joel on software

A String is not a general holder for bytes. You will undoubtedly have different default character encodings between your db2/Linux environment and your Windows environment which will be causing the conversion back and forth between bytes and characters to be different.