开发者

How to get the full content from HttpWebResponse if the return content is Transfer-Encoding:chunked?

开发者 https://www.devze.com 2023-04-04 05:57 出处:网络
I am writing a program to download html page from other website. I found a problem that for some particular website, I cannot get the full html code. And I only can get partial content.

I am writing a program to download html page from other website. I found a problem that for some particular website, I cannot get the full html code. And I only can get partial content. The server with this problem are sending data in "Transfer-Encoding:chunked" I am afraid this is the reason of the problem.

This the header information returned by server:

Transfer-Encoding: chunked
Connection: keep-alive
Pragma: no-cache
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Content-Type: text/html; charset=UTF-8
Date: Sun, 11 Sep 2011 09:46:23 GMT
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Server: nginx/1.0.6

Here is my code:

HttpWebRequest request = WebRequest.Create(url) as HttpWebRequest;
HttpWebResponse response;
CookieContainer cookie = new CookieContainer();
request.CookieContainer = cookie;
request.Allow开发者_StackOverflow中文版AutoRedirect = true;
request.KeepAlive = true;
request.UserAgent =
    @"Mozilla/5.0 (Windows NT 6.1; rv:6.0.2) Gecko/20100101 Firefox/6.0.2 FirePHP/0.6";
request.Accept = @"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
string html = string.Empty;
response = request.GetResponse() as HttpWebResponse;

using (StreamReader reader = new StreamReader(response.GetResponseStream()))
{
    html = reader.ReadToEnd();
}

I can only get partial html code ( I think it is the first chunk from the server). Could anyone help? Any Solution?

Thanks!


You can't use ReadToEnd to read chunked data. You need to read directly from the response stream using GetBytes.

StringBuilder sb = new StringBuilder();
Byte[] buf = new byte[8192];
Stream resStream = response.GetResponseStream();

do
{
     count = resStream.Read(buf, 0, buf.Length);
     if(count != 0)
     {
          sb.Append(Encoding.UTF8.GetString(buf,0,count)); // just hardcoding UTF8 here
     }
}while (count > 0);
String html = sb.ToString();


if I've understood what you're asking you can do it reading line by line

string htmlLine = reader.ReadLine();
0

精彩评论

暂无评论...
验证码 换一张
取 消