I'm trying to do some screen scraping, and discovered the HTML AgilityPack, but am having some trouble figuring out how to use it with VB.net.
The first thing I want to do is find th开发者_如何转开发e URL string for an HREF tag if I know the text that is enclosed in the HREF.
The second thing is that I want to do is parse an HTML table, going through each row, and pulling out the data so I can save it to a database (after some basic analysis).
Here is a good starting link here on SO: How to use HTML Agility pack
See also this: HtmlAgilityPack example for changing links doesn't work. How do I accomplish this?
And this: Finding all the A HREF Urls in an HTML document (even in malformed HTML)
To find a specific HREF, the xpath syntax would be "//a[@href='your url']", meaning: "get any A tag that has an HREF attribute equal to 'your url'.
EDIT:
To find an HREF if you only know the text, for example if you have the html text '<a href="homepage.html">Cars</a>
' and look for homepage.html, then this is how you would do it.
string s = @"<a href=""homepage.html"">Cars</a>";
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(s);
HtmlNode node = doc.DocumentNode.SelectSingleNode("//a[text()='Cars']");
Console.WriteLine("href=" + node.GetAttributeValue("href", null));
精彩评论