Why is it more memory efficient to read input as stream vs. string?_问答_开发者

We're using HTTPClient to implement a REST API.

We're reading the server response using:

method = new PostMethod(url);
HttpClient client = new HttpClient();
int statusCode = client.executeMethod(method);
String responseBody = method.getResponseBodyAsString();

When we do this we get this warning:

Dec 9, 2009 7:41:11 PM org.apache.commons.httpclient.HttpMethodBase getResponseBody
WARNING: Going to buffer response body of large or unknown size. Using getResponseBodyAsStream instead is recommended.

The docs go on to say:

HttpClient is capable of efficient request/response body streaming. Large entities may be submitted or received without being buffered in memory. This is especially critical if multiple开发者_如何学C HTTP methods may be executed concurrently. While there are convenience methods to deal with entities such as strings or byte arrays, their use is discouraged. Unless used carefully they can easily lead to out of memory conditions, since they imply buffering of the complete entity in memory.

So my question is, if you do need the complete response as a String (ie: to store in a DB, or to parse using DOM), why is it more memory efficient to use a stream?

It is more efficient to use a stream rather than getting the entire entity as a String because the latter means that

the entire contents of the response need to be read before they can be returned to your code, and
control cannot be returned to your code until the entire response has been sent by the server.

If you processed the response as a stream, then what you are actually doing is processing it N bytes at a time. This means that you can begin processing the first response segment while the remote server is still sending back the next segment of data. Therefore this makes more sense as an access method if your use-case allows you to process the data as it is received.

If however you need the entire response as a String for whatever reason, then all of the efficiencies of the stream method have no bearing to you whatsoever - because even if you read the response in pieces, you still need to wait for the entire response - and have it all contained in a single String - before you can process it.

The efficiency of using a stream is only available to you if you have a use-case where you can begin processing the response before you have the entire response body.

The entire process is not more memory efficient. If you read from a stream and put it in a string you are just separating the process into two parts so that the HttpClient class doesn't notice it.

If you really need the entire string then you can ignore the warning. It's then up to you to make sure that it doesn't use too much memory per request, so that the server can't easily be brought down by a DoS attack.

your question confuses the point.

if you ABSOLUTELY need the whole response as a string then do that,

but if you can at all get away with it, use streams.

when you load the whole response into a string, the whole response body is present in memory at the one time.

using streams, only a small portion of the response is held in memory at a time.

the documentation is saying that, especially with multiple large requests at once, loading the whole request body into a string will require a lot of memory.

If you're parsing into a org.w3c.Document (or better yet, a org.jdom.Document), it's really easy to directly use the stream. Ex:

org.apache.http.HttpResponse hr = httpClient.execute(httpRequest);
org.apache.http.HttpEntity he = hr.getEntity();
org.jdom.input.SAXBuilder builder = new SAXBuilder();
org.jdom.Document document = builder.build(he.getContent());