how to keep blank result for Nokogiri's NodeSet.search method_问答_开发者

how to keep blank result for Nokogiri's NodeSet.search method

开发者 https://www.devze.com 2023-01-31 12:23 出处：网络

i want to run the search method of Nokogiri::XML::NodeSet based on one NodeSet called nodeset for some xpath rule like below:

nodeset.search(rule)

the above code returns a NodeSet, but that doesn't contain the ones which can not match the rule. My intention is t开发者_JAVA技巧hat: if element in nodeset is matched the rule, ok please return the matched result; if not matched please return a blank string in the result, so that i can know which element in caller nodeset is matched, which element in caller nodeset is not matched.

Could someone tell me how to do it? i will appreciate your help very much.

Nokogiri NodeSet support set operations similar to Ruby arrays. Instead of keeping blanks in your matched set, find out the missed items after the fact:

require 'nokogiri'

doc = Nokogiri::XML <<-ENDXML
<root>
  <a id="a1" class="foo">
    <a id="a1a" class="foo" />
    <a id="a1b" class="foo" andalso="this" />
  </a>
  <a id="a2" class="foo" andalso="this">
    <a id="a2a" class="bar" />
    <a id="a2b" class="bar" andalso="this" />
  </a>
  <a id="a3" class="foo" andalso="this" />
</root>
ENDXML

foos = doc.xpath('//a[@class="foo"]')
p foos.map{ |e| e['id'] }
#=> ["a1", "a1a", "a1b", "a2"]

subselect = foos.xpath('self::*[@andalso="this"]')
p subselect.map{ |e| e['id'] }
#=> ["a1b", "a2", "a3"]

missed = foos - subselect
p missed.map{ |e| e['id'] }
#=> ["a1", "a1a"]

If you really want non-nodes in the result, you'll have to use #map instead of #search or other Nokogiri methods and get an Array instead of a NodeSet:

subselect = foos.map do |el|
  if el['andalso']=='this'
    el
  else
    ""
  end
end
p subselect.map{ |e| e=="" ? "" : e['id'] }
#=> ["", "", "a1b", "a2", "a3"]

I don't know nokogiri well enough to know how well this will work but I suspect the following example may suggest a way forward. The following assumes that NodeSet behaves like a ruby array which it does according to its API docs [1]

a = (0..9).to_a
 => [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
evens = a.select { |i| i % 2 == 0 }
 => [0, 2, 4, 6, 8]
odds = a - evens
 => [1, 3, 5, 7, 9]

I believe you should be able to do something similar with your nodeset so that when your search has been performed, you can find the non-matched nodes by subtracting the new nodeset from the original one.

[1] http://nokogiri.rubyforge.org/nokogiri/Nokogiri/XML/NodeSet.html#M000448

Here's how I'd go about it:

require 'nokogiri'

xml = <<EOT
<xml>
  <find_node>foo</find_node>
  <ignore_node>bar</ignore_node>
  <find_node>foo</find_node>
  <ignore_node>bar</ignore_node>
</xml>
EOT

# parse the document...
doc = Nokogiri::XML(xml)

# find the nodes we want...
desired_nodes = doc.search('//find_node')

# see if it's working...
desired_nodes.map{ |n| n.to_xml } # => ["<find_node>foo</find_node>", "<find_node>foo</find_node>"]

# walk the tree, grabbing the text or '' depending on whether the node is a hit or a miss...
node_result = doc.search('/xml/*').map{ |n| desired_nodes.include?(n) ? n.text : '' }

# ** here's the result **
node_result # => ["foo", "", "foo", ""]

# if we wanted to we could grab the desired_nodes' text...
desired_nodes.map{ |n| n.text } # => ["foo", "foo"]

# or find the ignored nodes...
ignored_nodes = doc.search('/xml/*') - desired_nodes
ignored_nodes.map{ |n| n.to_xml } # => ["<ignore_node>bar</ignore_node>", "<ignore_node>bar</ignore_node>"]

# ...and grab the ignored_nodes' text...
ignored_nodes.map{ |n| n.text } # => ["bar", "bar"]