How do I use XPath in Nokogiri?_问答_开发者_运维开发者技术经验分享

开发者 https://www.devze.com 2022-12-16 20:49 出处：网络

I have not found any documentation nor tutorial for that. Does anything like that exist? doc.xpath(\'//table/tbody[@id=\"threadbits_forum_251\"]/tr\')

相关专题：nokogiri ruby

I have not found any documentation nor tutorial for that. Does anything like that exist?

doc.xpath('//table/tbody[@id="threadbits_forum_251"]/tr')

The code above will get me any table, anywhere, that has a tbody chil开发者_开发技巧d with the attribute id equal to "threadbits_forum_251". But why does it start with double //? Why there is /tr at the end? See "Ruby Nokogiri Parsing HTML table II" for more details.

Can anybody tell me how to extract href, id, alt, src, etc., using Nokogiri?

td[3]/div[1]/a/text()' <--- extracts text

How can I extract other things?

Seems you need to read a XPath Tutorial

Your //table/tbody[@id="threadbits_forum_251"]/tr expression means:

// - Anywhere in your XML document
table/tbody - take a table element with a tbody child
[@id="threadbits_forum_251"] - where id attribute are equals to "threadbits_forum_251"
tr - and take its tr elements

So, basically, you need to know:

attributes begins with @
conditions go inside [] brackets

If I correcly understood that API, you can go with doc.xpath("td[3]/div[1]/a")["href"], or td[3]/div[1]/a/@href if there is just one <a> element.

Your XPath is correct and you seem to have answered your own question's first part (almost):

doc.xpath('//table/tbody[@id="threadbits_forum_251"]/tr')

"the code above will get me any ~~table~~ table's tr, anywhere, that has a tbody child with the attribute id equal to threadbits_forum_251"

// means the following element can appear anywhere in the document.

/tr at the end means, get the tr node of the matching element.

You dont need to extract each attribute one by one. Just get the entire node containing all four attributes in Nokogiri, and get the attributes using:

theNode['href']
theNode['src']

Where theNode is your Nokogiri Node object.

Edit:

Sorry I haven't used these libraries, but I think the XPath evaluation and parsing is being done by Mechanize. So here's how you would get the entire element and its attributes in one go.

doc.xpath("td[3]/div[1]/a").each do |anchor|
    puts anchor['href']
    puts anchor['src']
    ...
end

How do I use XPath in Nokogiri?

精彩评论

关注公众号

热门标签

图文推荐

How do I use XPath in Nokogiri?

更多 问答 相关资讯：

精彩评论

关注公众号

热门标签

图文推荐

更多问答相关资讯：