We have a system where a client makes an HTTP GET request, the system does some processing on the backend, zips the results, and sends it to the client. Since the processing can take some time, we send this as a ZipOutputStream
wrapping the response.getOutputStream()
.
However, when we have an exceptionally small amount of data in the first ZipEntry
, and the second entry takes a long time, the browser the client is using times out. We've tried flushing the stream buffer, but no response seems to be sent to the browser until at least 1000 bytes have been written to the stream. Oddly, once the first 1000 bytes have been sent, subsequent flushes seem to work fine.
I tried stripping down the code to bare-bones to give an example:
protected void doGet(HttpServletRequest request,
HttpServletResponse response) throws ServletException, IOException {
try {
ZipOutputStream _zos = new ZipOutputStream( response开发者_开发知识库.getOutputStream());
ZipEntry _ze = null;
long startTime = System.currentTimeMillis();
long _lByteCount = 0;
response.setContentType("application/zip");
while (_lByteCount < 2000) {
_ze = new ZipEntry("foo");
_zos.putNextEntry( _ze );
//writes 100 bytes and then waits 10 seconds
_lByteCount += StreamWriter.write(
new ByteArrayInputStream(DataGenerator.getOutput().toByteArray()),
_zos );
System.out.println("Zip: " + _lByteCount + " Time: " + ((System.currentTimeMillis() - startTime) / 1000));
//trying to flush
_zos.finish();
_zos.flush();
response.flushBuffer();
response.getOutputStream().flush();
}
} catch (Throwable e) {
e.printStackTrace();
}
}
I set my browser timeout to be about 20 seconds for easy reproduction. Despite writing the 100 bytes a couple of times, nothing is sent to the browser and the browser times out. If I expand the browser timeout, nothing gets sent until 1000 bytes have been written and then the browser pops up the "Would you like to save..." dialog. Again, after the initial 1000 bytes, each addition 100 bytes sends fine, rather than buffering to 1000 byte chunks.
If I set the max byte count in the while condition to 200 or so, it works fine, sending only 200 bytes.
What can I do to force the servlet to send back really small initial amounts of data?
It turns out there is a limit on the underlying Apache/Windows IP stack that buffers data from a stream in an attempt to be efficient. Since most people have the problem of too much data, not the problem of too little data, this is right most of the time. What we ended up doing was requiring the user to request enough data that we'd hit the 1000 byte limit before timing out. Sorry for taking so long to answer the question.
I know this is a really, really old question, but for the record, I wanted to post an answer that should be a fix all for the issue that you are experiencing.
The key is that you want to flush the response stream, not the zip stream. Because the ZIP stream cannot flush what is not yet ready to write. Your client, as you mentioned, is timing out because it is not receiving a response in a predetermined amount of time, but once it receives data, it is patient and will wait a very long time to download the file, thus the fix is easy, provided you flush the correct stream. I recommend the following:
protected void doGet(HttpServletRequest request,
HttpServletResponse response) throws ServletException, IOException {
try {
ZipOutputStream _zos = new ZipOutputStream( response.getOutputStream());
ZipEntry _ze = null;
long startTime = System.currentTimeMillis();
long _lByteCount = 0;
response.setContentType("application/zip");
// force an immediate response of the expected content
// so the client can begin the download process
response.flushBuffer();
while (_lByteCount < 2000) {
_ze = new ZipEntry("foo");
_zos.putNextEntry( _ze );
//writes 100 bytes and then waits 10 seconds
_lByteCount += StreamWriter.write(
new ByteArrayInputStream(DataGenerator.getOutput().toByteArray()),
_zos );
System.out.println("Zip: " + _lByteCount + " Time: " + ((System.currentTimeMillis() - startTime) / 1000));
//trying to flush
_zos.finish();
_zos.flush();
}
} catch (Throwable e) {
e.printStackTrace();
}
Now, what should happen here, is the header and response codes will be committed along with anything in the response buffer's OutputStream. This does not close the stream, so any additional writes to the stream are appended. The downside to doing it this way, is that you cannot know the content-length to assign to the header. The positive is that you are starting the download immediately, and not allowing the browser to timeout.
My guess is that the zip output stream doesn't actually write anything before beeing able to compress stuff. Huffmann algorithm used for zipping requires all data to be known before actually beeing able to compress anything. It can't start before everything is known basically.
Zipping might be a win if the amount of data is big, but I don't think you can achieve asynchronous reponse while zipping data.
I entirely can't reproduce your problem. Below is your code, slightly altered, running in an embedded Jetty server. I ran it in IntelliJ and requested http://localhost:8080 from Firefox. As expected, the "Save or Open" dialog popped up after 1 second. Selecting "save" and waiting for 20 seconds results in a zip file which can be opened and contains 20 separate entries, named foo<number> each containing a single line 100 characters wide and ending with <number>. This is on Windows 7 Premium 64 with JDK 1.6.0_26. Chrome acts the same way. IE, on the other hand, seems to normally wait for 5 seconds (500 bytes), though once it showed the dialog immediately, and another time it seemed to wait for 9 or 10 seconds. Try it in different browsers:
import org.eclipse.jetty.server.Server;
import org.eclipse.jetty.servlet.ServletContextHandler;
import org.eclipse.jetty.servlet.ServletHolder;
import javax.servlet.ServletException;
import javax.servlet.http.*;
import java.io.IOException;
import java.util.zip.ZipEntry;
import java.util.zip.ZipOutputStream;
public class ZippingAndStreamingServlet {
public static void main(String[] args) throws Exception {
Server server = new Server(8080);
ServletContextHandler context = new ServletContextHandler(ServletContextHandler.SESSIONS);
context.setContextPath("/");
server.setHandler(context);
context.addServlet(new ServletHolder(new BufferingServlet()), "/*");
server.start();
System.out.println("Listening on 8080");
server.join();
}
static class BufferingServlet extends HttpServlet {
protected void doGet(HttpServletRequest request,
HttpServletResponse response) throws ServletException, IOException {
ZipOutputStream _zos = new ZipOutputStream(response.getOutputStream());
ZipEntry _ze;
long startTime = System.currentTimeMillis();
long _lByteCount = 0;
int count = 1;
response.setContentType("application/zip");
response.setHeader("Content-Disposition", "attachment; filename=my.zip");
while (_lByteCount < 2000) {
_ze = new ZipEntry("foo" + count);
_zos.putNextEntry(_ze);
byte[] bytes = String.format("%100d", count++).getBytes();
System.out.println("Sending " + bytes.length + " bytes");
_zos.write(bytes);
_lByteCount += bytes.length;
sleep(1000);
System.out.println("Zip: " + _lByteCount + " Time: " + ((System.currentTimeMillis() - startTime) / 1000));
_zos.flush();
}
_zos.close();
}
private void sleep(int millis) {
try {
Thread.sleep(millis);
} catch (InterruptedException e) {
throw new IllegalStateException("Unexpected interrupt!", e);
}
}
}
}
You could be getting screwed by the Java API.
Looking through the JavaDocs of the various OutputStream family of classes (OutputStream, ServletOutputStream, FilterOutputStream, and ZipOutputStream) , they either mention that they rely on the underlying stream for flush() or they declare that flush() doesn't do anything (OutputStream).
ZipOutputStream inherits flush() and write() from FilterOutputStream.
From the FilterOutputStream JavaDoc:
The flush method of FilterOutputStream calls the flush method of its underlying output stream.
In the case of ZipOutputStream, it is being wrapped around the stream returned from ServletResponse.getOutputStream() which is a ServletOutputStream. It turns out that ServletOutputStream doesn't implement flush() either, it inherits it from OutputStream which specifically mentions in its JavaDoc:
flush public void flush()
throws IOExceptionFlushes
this output stream and forces any
buffered output bytes to be written out. The general contract of flush
is that calling it is an indication that, if any bytes previously
written have been buffered by the implementation of the output stream,
such bytes should immediately be written to their intended
destination. If the intended destination of this stream is an
abstraction provided by the underlying operating system, for example a
file, then flushing the stream guarantees only that bytes previously
written to the stream are passed to the operating system for writing;
it does not guarantee that they are actually written to a physical
device such as a disk drive.
**The flush method of OutputStream does nothing.**
Maybe this is a special case, I don't know. I do know that flush() has been around a long time and it is unlikely that no one has noticed a hole in the functionality there.
It makes me wonder if there is an operating system component to the stream buffering that could be configured to remove the 1k buffer effect.
A related question has a similiar issue but was working directly with a file instead of from a Stream abstraction from Java and this answer points to the MSDN articles involved regarding file buffering and file caching.
A similar scenario was listed in the bug database.
Summary
The Java IO library relies on the OS implementation for Streams. If the OS has caching turned on, Java code may not be able to force a different behavior. In the case of Windows you have to open the file and send non-standard parameters to allow for write-through-cache or no-buffereing functionality. I doubt the Java SDK provides such OS-specific options since they are trying to create platform-generic APIs.
The issue is that by default each servlet implementation buffers the data whereas SSE and other custom requirements might/will need data immediately.
The solution is to do the following:
response.setBufferSize(1) // or some similar small number for such servlets.
This will ensure that the data is written out earlier (with the resultant performance loss)
精彩评论