开发者

parsing links and tables using VB.net HTML AgilityPack

开发者 https://www.devze.com 2023-02-27 04:27 出处:网络
I\'m trying to do some screen scraping, and discovered the HTML AgilityPack, but am having some trouble figuring out how to use it with VB.net.

I'm trying to do some screen scraping, and discovered the HTML AgilityPack, but am having some trouble figuring out how to use it with VB.net.

The first thing I want to do is find th开发者_如何转开发e URL string for an HREF tag if I know the text that is enclosed in the HREF.

The second thing is that I want to do is parse an HTML table, going through each row, and pulling out the data so I can save it to a database (after some basic analysis).


Here is a good starting link here on SO: How to use HTML Agility pack

See also this: HtmlAgilityPack example for changing links doesn't work. How do I accomplish this?

And this: Finding all the A HREF Urls in an HTML document (even in malformed HTML)

To find a specific HREF, the xpath syntax would be "//a[@href='your url']", meaning: "get any A tag that has an HREF attribute equal to 'your url'.

EDIT:

To find an HREF if you only know the text, for example if you have the html text '<a href="homepage.html">Cars</a>' and look for homepage.html, then this is how you would do it.

        string s = @"<a href=""homepage.html"">Cars</a>";

        HtmlDocument doc = new HtmlDocument();
        doc.LoadHtml(s);

        HtmlNode node = doc.DocumentNode.SelectSingleNode("//a[text()='Cars']");
        Console.WriteLine("href=" + node.GetAttributeValue("href", null));
0

精彩评论

暂无评论...
验证码 换一张
取 消