开发者

HttpWebRequest versus browser request

开发者 https://www.devze.com 2023-02-02 10:15 出处:网络
I used to retrieve data from a site using a c# program.(nseindia.com) however recently NSE made some changes so that any request from any program is responded with a “403 Forbidden Error”. Can anyon

I used to retrieve data from a site using a c# program.(nseindia.com) however recently NSE made some changes so that any request from any program is responded with a “403 Forbidden Error”. Can anyone tell me a way to make the request from the program identical to that from the browser. I tried setting the userAgent property but thats not working. The code is pasted below.

string DownloadData(string CompanyName)
{
    string address = string.Format(@"http://www.nseindia.com");
    //http://www.nseindia.com/marketinfo/sym_map/symbolMapping.jsp?dataType=priceVolumeDeliverable&symbol=abb&
    //http://www.nseindia.com/content/equities/scripvol/datafiles/01-12-2008-TO-29-12-2010ABBALLN.csv
    HttpWebRequest request = (HttpWebRequest)WebRequest.Create(address);
    request.UserAgent = "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.12) Gecko/20101026 Firefox/3

    string strData = "";
    try
    {
        request.Proxy = WebProxy.GetDefaultProxy();
        HttpWebResponse response = (HttpWebRespons开发者_运维问答e)request.GetResponse();
        System.IO.Stream stream = response.GetResponseStream();
        System.Text.Encoding ec = System.Text.Encoding.GetEncoding("utf-8");
        System.IO.StreamReader reader = new System.IO.StreamReader(stream, ec);
        strData = reader.ReadToEnd();
        if (strData.Contains("Error"))
        {
            Exception e = new Exception(strData);
            throw e;
        }
    }
    catch(Exception e)
    {
        Console.WriteLine(e.ToString());
    }

    return strData;
}


Your best bet is to spy your browser to see exactly the requests sent and responses received.

There is numerous addins for that, depending on your browser.


Try setting the Accept HTTP header; e.g.:

request.Accept = "Accept: text/html,application/xhtml+xml,application/xml";

I arrived at this suggestion by running Fiddler2 (as suggested in a comment to another answer) in order to see how my browser (Firefox 4 Beta) makes the HTTP request to the website you mentioned.

I then set all headers in the code and eliminated one by one. As soon as I removed the Accept header, the 403 status code was returned.

Exact request made by my browser:

GET / HTTP/1.0
Host: www.nseindia.com
User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:2.0b8) Gecko/20100101 Firefox/4.0b8
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: de,en;q=0.5
Accept-Encoding: gzip, deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7

PS: The other URIs you mention in the comments seem to be invalid. One is incomplete and yields a 500 Internal Server Error, the other yields a 404 Not Found response.


Try to set credentials as default like this

request.Credentials = System.Net.CredentialCache.DefaultCredentials;

or

NetworkCredential nc = new NetworkCredential("user", "password");
request.Credentials = nc;

if you need username password to access that web page

or an another option is to use WebBrowser control ;)

0

精彩评论

暂无评论...
验证码 换一张
取 消