I'm working a on a link checker/broken link finder and I am getting many false positives, after double checking I noticed that many error codes were returning webexceptions but they were actually downloadable, but in some other cases the statuscode is 404 and i can access the page from the browse.
So here is the code, its pretty ugly, and id like to have something more, id say practical. All the status 开发者_运维知识库codes are in that big if are used to filter the ones i dont want to add to brokenlink because they are valid links ( i tested them all ). What i need to fix is the structure (if possible) and how to not get false 404.
Thank you!
try
{
   HttpWebRequest request = ( HttpWebRequest ) WebRequest.Create ( uri );
   request.Method = "Head";
   request.MaximumResponseHeadersLength = 32; // FOR IE SLOW SPEED
   request.AllowAutoRedirect = true;
   using ( HttpWebResponse response = ( HttpWebResponse ) request.GetResponse() )
   {
      request.Abort();
   }
   /* WebClient wc = new WebClient();
     wc.DownloadString( uri ); */
   _validlinks.Add ( strUri );
}
catch ( WebException wex )
{
   if (    !wex.Message.Contains ( "The remote name could not be resolved:" ) &&
           wex.Status != WebExceptionStatus.ServerProtocolViolation )
   {
      if ( wex.Status != WebExceptionStatus.Timeout )
      {
         HttpStatusCode code = ( ( HttpWebResponse ) wex.Response ).StatusCode;
         if (
            code != HttpStatusCode.OK &&
            code != HttpStatusCode.BadRequest &&
            code != HttpStatusCode.Accepted &&
            code != HttpStatusCode.InternalServerError &&
            code != HttpStatusCode.Forbidden &&
            code != HttpStatusCode.Redirect &&
            code != HttpStatusCode.Found
         )
         {
            _brokenlinks.Add ( new Href ( new Uri ( strUri , UriKind.RelativeOrAbsolute ) , UrlType.External ) );
         }
         else _validlinks.Add ( strUri );
      }
      else _brokenlinks.Add ( new Href ( new Uri ( strUri , UriKind.RelativeOrAbsolute ) , UrlType.External ) );
   }
   else _validlinks.Add ( strUri );
}
You should add a UserAgent header, since many websites require them.
 
         
                                         
                                         
                                         
                                        ![Interactive visualization of a graph in python [closed]](https://www.devze.com/res/2023/04-10/09/92d32fe8c0d22fb96bd6f6e8b7d1f457.gif) 
                                         
                                         
                                         
                                         加载中,请稍侯......
 加载中,请稍侯......
      
精彩评论