开发者

XPATH query, HtmlAgilityPack and Extracting Text

开发者 https://www.devze.com 2023-01-02 06:03 出处:网络
I had been trying to extract links from a class called \"tim_new\" . I have been given a solution as well.

I had been trying to extract links from a class called "tim_new" . I have been given a solution as well.

Both the solution, snippet and necessary information is given here

The said XPATH query was "//a[@class='tim_new'], my question is, how did this query differentiate between the first line of the snippet (given in the link above and the second line of the snippet).

More specifically, what is the literal translation (in English) of this XPATH query.


Furthermore, I want to write a few lines of code to extract the text written against NSE:

<div class="FL gL_12 PL10 PT15">BSE: 523395 &nbsp;&nbsp;|&nbsp;&a开发者_如何学Gomp;nbsp; NSE: 3MINDIA &nbsp;&nbsp;|&nbsp;&nbsp; ISIN: INE470A01017</div>

Would appreciate help in forming the necessary selection query.

My code is written as:

IEnumerable<string> NSECODE = doc.DocumentNode.SelectSingleNode("//div[@NSE:]");

But this doesnt look right. Would appreciate some help.


The XPath in the first selection reads "select all document elements that have an attribute named class with a value of tim_new". The stuff in brackets is not what you're returning, it's the criteria you're applying to the search.

I don't have the HTML Agility pack, but if you are trying to query the divs that have "NSE:" as its text, your XPath for the second query should just be "//div" then you'll want to filter using LINQ.

Something like

var nodes = 
    doc.DocumentNode.SelectNodes("//div[text()]").Where(a => a.InnerText.IndexOf("NSE:") > -1);

So in English, "Return all the div elements that immediately contain text to LINQ, then check that the inner text value contains NSE:". Again, I'm not sure the syntax is perfect, but that's the idea.

The XPath "//div[@NSE:]" would return all divs that have and attribute named, NSE:, which would be illegal anyway because ":" isn't allowed in an attribute name. Youre looking for the text of the element, not one of its attributes.

Hope that helps.'

Note: If you have nested divs that both contain text as in <div>NSE: some text<div>NSE: more text</div></div> you're going to get duplicate results.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号