I am messing around trying to write a small web crawler. I parse out a url from some html and sometimes I get a php redirect page. I am looking for a way to get the uri of the redirected page.
I am trying to use System.Net.WebRequest to get a a stream using code like this
WebRequest req = WebRequest.Create(link);
Stream s = req.GetResponse().GetResponseStream();
StreamReader st = new StreamReader(WebRequest.Create(link).GetResponse().GetResponseStream());
The problem is that the link is a PHP redirect, so the stream is always null. How would I get the URI to the page the php is redirecting开发者_高级运维?
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(link);
req.AllowAutoRedirect = true;
reg.AutomaticDecompression = DecompressionMethods.GZip;
StreamReader _st = new StreamReader(_req.GetResponseStream(), System.Text.Encoding.GetEncoding(req.CharacterSet));
the AllowAutoRedirect will automatically take you to the new URI; if that is you're desired effect. The AutomaticDecompression will auto decompress compressed responses. Also you should be executing the get response stream part in a try catch block. I my exp it throws alot of WebExceptions.
Since you're experimenting with this technology make sure you read the data with the correct encoding. If you attempt to get data from a japanese site without using Unicode then the data will be invalid.
Check the "Location" header from the response - it should contain the new URL.
精彩评论