HTML Agility Pack Question (Attempting to parse string from source)_问答_开发者

HTML Agility Pack Question (Attempting to parse string from source)

开发者 https://www.devze.com 2023-02-23 04:39 出处：网络

I am attempting to use the Agility pack to parse certain bits of info from various pages. I am kind of worried that using this might be overkill for what I need, if that is case feel free to let me know. Anyway, I am attempting t开发者_如何转开发o parse a page from motley fool to get the name of a company based on the ticker. I will be parsing several pages to get stock info in a similar way.

The HTML that I want to parse looks like:

<h1 class="subHead"> 
    Microsoft Corp <span>(NASDAQ:MSFT)</span>
</h1>

Also, the page I want to parse is: http://caps.fool.com/Ticker/MSFT.aspx

So, I guess my question is how do I simply get the Microsoft Corp from the html and should I even be using the agility pack to do things like this?

Edit: Current code

public String getStockName(String ticker)
{
    String text ="";
    HtmlAgilityPack.HtmlWeb web = new HtmlAgilityPack.HtmlWeb();
    HtmlAgilityPack.HtmlDocument doc = web.Load("http://caps.fool.com/Ticker/" + ticker + ".aspx");

    var node = doc.DocumentNode.SelectSingleNode("/h1[@class='subHead']");
    text = node.FirstChild.InnerText.Trim();
    return text;
}

This would give you a list of all stock names, for your sample Html just of Microsoft:

HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.Load("test.html");

var nodes = doc.DocumentNode.SelectNodes("//h1[@class='subHead']");
foreach (var node in nodes)
{
    string text = node.FirstChild.InnerText; //output: "Microsoft Corp"
    string textAll = node.InnerText; //output: "Microsoft Corp (NASDAQ:MSFT)"
}

Edit based on updated question - this should work for you:

string text = "";
HtmlWeb web = new HtmlWeb();

string url = string.Format("http://caps.fool.com/Ticker/{0}.aspx", ticker);
HtmlAgilityPack.HtmlDocument doc = web.Load(url);

var node = doc.DocumentNode.SelectSingleNode("//h1[@class='subHead']");
text = node.FirstChild.InnerText.Trim();
return text;

Use an xpath expression to select the element then pickup the text.

 foreach (var element in doc.DocumentNode.SelectNodes("//h1[@clsss='subHead']/span"))
 {
    Console.WriteLine (element.InnerText);
 }

HTML Agility Pack Question (Attempting to parse string from source)

精彩评论

关注公众号

热门标签

图文推荐

HTML Agility Pack Question (Attempting to parse string from source)

更多 问答 相关资讯：

精彩评论

关注公众号

热门标签

图文推荐

更多问答相关资讯：