开发者

.NET Link validator API?

开发者 https://www.devze.com 2023-02-02 20:57 出处:网络
Is anyone aware of any good link validation API. I am not looking for any kind of web crawler, just something to validate a full page or single links. I\'ve been looking for one, because I am having s

Is anyone aware of any good link validation API. I am not looking for any kind of web crawler, just something to validate a full page or single links. I've been looking for one, because I am having some problems with mines that I cannot solve at the moment.

A few of the major problems are:

  • Some async web requests are never ending
  • Getting many false positives
  • Getting 404 when it's a redirect

I'll post up my code in case.

First method is to start the validation

private void urlCheck( Link strUri )
{
    try
    {
        Uri uri = new Uri( strUri.URL , 
            ( strUri.URL.StartsWith( "/" ) ) ? 
                UriKind.Relative : UriKind.Absolute );

        if( !uri.IsAbsoluteUri )
            uri = new Uri( _page.HttpDomain + uri );

        HttpWebRequest request = (HttpWebRequest)WebRequest.Create( uri );
        request.Method = "GET";
        request.UserAgent = 
            "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.2; Trident/4.0)";
        request.AllowAutoRedirect = true;
        request.AllowWriteStreamBuffering = true;
        request.SendChunked = true;
        request.UnsafeAuthenticatedConnectionSharing = true;
        request.KeepAlive = false;
        request.Referer = "http://www.google.ca/";
        // default : WebRequest.DefaultWebProxy
        request.Proxy = null; 
        request.Timeout = 20000;

        //do not revalidate this
        WebPageCollection.DoNotRevalidateLinks.Add( strUri );
        request.BeginGetResponse( new AsyncCallback( getResponseCallback ) , 
            request );
        _webRequest++;
    }
    catch( Exception ex )
    {
        Console.WriteLine( ex.StackTrace);
    }
}

Second method is the callback

priva开发者_如何学JAVAte void getResponseCallback( IAsyncResult result )
{
    HttpWebRequest request = (HttpWebRequest)result.AsyncState;
    string strUri = request.Address.ToString();

    Link href = new Link( strUri );
    href.URLKind = urlKind;
    href.URLType = UrlType.External;
    href.URLState = UrlState.Valid;

    try
    {
        HttpWebResponse response = (HttpWebResponse)request.GetResponse();

        if( response.StatusCode == HttpStatusCode.Redirect )
        {
             //TODO: Redirects
             href.URLState = UrlState.Redirect;   
        }
    }
    catch( WebException wex )
    {
        href.URLState = UrlState.Broken;
    }

    _page.Links.Add( href );
    _webRequestComplete++;
    request.EndGetResponse( result );
}

The two incremented variables are to make sure that both of the counts are equals, and in many cases they're not and I end up with an infinite loop.


Is there a reason that you're setting SendChunked? That throws a ProtocolViolationException for me most of the time. Change the catch statement in your urlCheck() method to re-throw the error to see it.

UPDATE

Sorry to keep hammering a point but I think you're losing the error. It sounds like you're doing this on an ASPX page (you mentioned web.config) but you're using Console.Write in your catch so you're never seeing it. According to MSDN, a ProtocolViolationException will be thrown when:

Method is GET or HEAD, and either ContentLength is greater than zero or SendChunked is true.

0

精彩评论

暂无评论...
验证码 换一张
取 消