I am developing a tool for validation of links in url entered. suppose i have entered a url (e.g http://www-review-k6.thinkcentral.com/content/hsp/science/hspscience/na/gr3/se_9780153722271_/content/nlsg3_006.html ) in textbox1 and i want to check whether the contents of all the links exists on remote server or not. 开发者_如何学Pythonfinally i want a log file for the broken links.
You can use HttpWebRequest.
Note four things
1) The webRequest will throw exception if the link doesn't exist
2) You may like to disable auto redirect
3) You may also like to check if it's a valid url. If not, it will throw UriFormatException.
UPDATED
4) Per Paige suggested , Use "Head" in request.Method so that it won't download the whole remote file
static bool UrlExists(string url)
{
try
{
HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create(url);
request.Method = "HEAD";
request.AllowAutoRedirect = false;
request.GetResponse();
}
catch (UriFormatException)
{
// Invalid Url
return false;
}
catch (WebException ex)
{
// Valid Url but not exists
HttpWebResponse webResponse = (HttpWebResponse)ex.Response;
if (webResponse.StatusCode == HttpStatusCode.NotFound)
{
return false;
}
}
return true;
}
Use the HttpWebResponse class:
HttpWebRequest webRequest = (HttpWebRequest)WebRequest.Create("http://www.gooogle.com/");
HttpWebResponse response = (HttpWebResponse)webRequest.GetResponse();
if (response.StatusCode == HttpStatusCode.NotFound)
{
// do something
}
bool LinkExist(string link)
{
HttpWebRequest webRequest = (HttpWebRequest) webRequest.Create(link);
HttpWebResponse webResponse = (HttpWebResponse)webRequest.GetResponse();
return !(webResponse.StatusCode != HttpStatusCode.NotFound);
}
Use an HTTP HEAD request as explained in this article: http://www.eggheadcafe.com/tutorials/aspnet/2c13cafc-be1c-4dd8-9129-f82f59991517/the-lowly-http-head-reque.aspx
Make a HTTP request to the URL and see if you get a 404 response. If so then it does not exist.
Do you need a code example?
If your goal is robust validation of page source, consider usign a tool that is already written, like the W3C Link Checker. It can be run as a command-line program that handles finding links, pictures, css, etc and checking them for validity. It can also recursively check an entire web-site.
精彩评论