I am trying to learn how to get all the img src from a URL. But, the imgs
variable in my code is always null
. What am I doing wrong?
static void Main(string[] args)
{
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml("http://archive.ncsa.illinois.edu/primer.html");
HtmlAgilityPack.HtmlNodeCollection imgs = doc.DocumentNode.Sele开发者_如何转开发ctNodes("//img");
if (imgs != null)
{
foreach (HtmlAgilityPack.HtmlNode img in imgs)
{
string imgSrc = img.Attributes["src"].Value;
}
}
Console.ReadKey();
}
You are using HtmlDocument.LoadHtml which is designed to take html source and not a url.
You could use the WebClient to get the html e.g.
WebClient wc = new WebClient();
string html = wc.DownloadString("http://archive.ncsa.illinois.edu/primer.html");
doc.LoadHtml(html);
HtmlDocument also supports a Load that allows content to be loaded from various other sources.
精彩评论