What could lead to the creation of false EOF in a GZip compressed data stream_问答_开发者

We are streaming data between a server (written in .Net running on Windows) to a client (written in Java running on Ubuntu) in batches. The data is in XML format. Occasionally the Java client throws an unexpected EOF while trying decompress the stream. The message content always varies and is user driven. The response from the client is also compressed using GZip. This never fails and seems to be rock solid. The response from the client is controlled by the system.

Is there a chance that some arrangement of characters or some special characters are creating false EOF markers? Could it be white-space related? Is GZip suitable for compressing XML?

I am assuming that the code to read and write from the input/output streams works because we only occasionally gets this exception and when we inspect the user data at the time there seems to be special characters (which is why I asked the question) such as the '@' sign.

Any ideas?

UPDATE: The actual code as requested. I thought it wasn't this due to the fact that I had been to a couple of sites to get help on this issue and they all more or less had the same code. Some sites mentioned appended GZip. Something to do with GZip creating multiple segments?

public String receive() throws IOException {

    byte[] buffer = new byte[8192];
    ByteArrayOutputStream baos = new ByteArrayOutputStream(8192);

    do {
        int nrBytes = in.read(buffer);
        if (nrBytes > 0) {
            baos.write(buffer, 0, nrBytes);
        }
    } while (in.available() > 0);
    return compressor.decompress(baos.toByteArray());
}
   public String decompress(byte[] data) throws IOException {
    ByteArrayOutputStream buffer = new ByteArrayOutputStream();
    ByteArrayInputStream in = new ByteArrayInputStream(data);

    try {
        GZIPInputStream inflater = new GZIPInputStream(in); 
        byte[] byteBuffer = new byte[8192];
        int r;
        while((r = inflater.read(byteBuffer)) > 0 ) {
            buffer.write(byteBuffer, 0, r); 
        }
    } catch (IOException e) {
        log.error("Could not decompress stream", e);
  开发者_开发百科      throw e;
    }
    return new String(buffer.toByteArray());
}

At first I thought there must be something wrong with the way that I am reading in the stream and I thought perhaps I am not looping properly. I then generated a ton of data to be streamed and checked that it was looping. Also the fact they it happens so seldom and so far has not been reproducable lead me to believe that it was the content rather than the scenario. But at this point I am totally baffled and for all I know it is the code.

Thanks again everyone.

Update 2:

As requested the .Net code:

Dim DataToCompress = Encoding.UTF8.GetBytes(Data)
Dim CompressedData = Compress(DataToCompress)

To get the raw data into bytes. And then it gets compressed

      Private Function Compress(ByVal Data As Byte()) As Byte()
            Try
                Using MS = New MemoryStream()
                    Using Compression = New GZipStream(MS, CompressionMode.Compress)
                        Compression.Write(Data, 0, Data.Length)
                        Compression.Flush()
                        Compression.Close()
                        Return MS.ToArray()
                    End Using
                End Using
            Catch ex As Exception
                Log.Error("Error trying to compress data", ex)
                Throw
            End Try
        End Function

Update 3: Also added more java code. the in variable is the InputStream return from socket.getInputStream()

It certainly shouldn't be due to the data involved - the streams deal with binary data, so that shouldn't make any odds at all.

However, without seeing your code, it's hard to say for sure. My first port of call would be to check anywhere that you're using InputStream.read() - check that you're using the return value correctly, rather than assuming a single call to read() will fill the buffer.

If you could provide some code, that would help a lot...

I would suspect that for some reason the data is altered underway, by treating it as text, not as binary, so it may either be \n conversions or a codepage alteration.

How is the gzipped stream transferred between the two systems?

It is not pssible. EOF in TCP is delivered as an out of band FIN segment, not via the data.