开发者

Getting all the anchor tags of a web page

开发者 https://www.devze.com 2022-12-20 20:12 出处:网络
Given a web URL, I want to detect all the links in a WEBSITE, identify the internal links and list them.

Given a web URL, I want to detect all the links in a WEBSITE, identify the internal links and list them.

What I have is this:

            WebClient webClient = null;
            webClient = new WebClient();

            string strUrl = "http://www.anysite.com";
            string completeHTMLCode = "";

            try
            {
                completeHTMLCode = webClient.DownloadString(st开发者_如何学PythonrUrl);
            }
            catch (Exception)
            {                    
            }

Using this I can read the contents of the page....but the only idea I have in my mind is parsing this string....searching for <a then href then the value between the double quotes.

Is this the only way out? Or there lies some other better solution(s)?


Use the HTML Agility Pack. Here's a link to a blog post to get you started. Do not use Regex.


using HtmlAgilityPack

 completeHTMLCode = 
   webClient.DownloadString(strUrl);

 doc.Load(completeHTMLCode);
 foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//a[@a"])
 {
   //
 }
0

精彩评论

暂无评论...
验证码 换一张
取 消