Nokogiri: How to retrieve the text from an XML::Element, excluding the text from its descendants?_问答_开发者

Nokogiri: How to retrieve the text from an XML::Element, excluding the text from its descendants?

开发者 https://www.devze.com 2023-03-19 04:37 出处：网络

Is there a more elegant way to write the following code? 开发者_运维技巧def get_text(element) text_node = element.children.find &:text?

相关专题：nokogiri ruby

Is there a more elegant way to write the following code?

开发者_运维技巧def get_text(element)
  text_node = element.children.find &:text?
  text_node.text if text_node
end

You can write

element.xpath('text()').to_s

which returns the raw text of text children of element excluding any text in descendant nodes (whereas your code only return the first text child of element).

Remember that the DOM is hierarchical so you need to remove the child nodes:

Starting with this:

require 'nokogiri'

xml = <<EOT
<xml>
  <a>some text
    <b>
      <c>more text</c>
    </b>
  </a>
</xml>
EOT

doc = Nokogiri::XML(xml)

If you don't mind doing it destructively:

doc.at('b').remove
doc.text #=> "\n  some text\n    \n  \n"

If you do mind:

a_node = Nokogiri::XML.fragment(doc.at('a').to_xml)

a_node.at('b').remove
a_node.text #=> "some text\n    \n  "

Strip the trailing carriage returns and you should be good to go.

of course this syntax will also help you

==================================

doc = Nokogiri::Slop <<-EOXML
<employees>
  <employee status="active">
    <fullname>Dean Martin</fullname>
  </employee>
  <employee status="inactive">
    <fullname>Jerry Lewis</fullname>
  </employee>
</employees>
EOXML
====================================
# navigate!
doc.employees.employee.last.fullname.content # => "Jerry Lewis"

fullname = @doc.xpath("//character")
puts fullname.text