开发者

Where can I found some official documentation for text() xpath query syntax?

开发者 https://www.devze.com 2023-02-19 20:22 出处:网络
Take this syntax for example $xpath = new DOMXPath($dom); $n开发者_运维技巧odes = $xpath->query(\'//text()\');

Take this syntax for example

$xpath = new DOMXPath($dom);
$n开发者_运维技巧odes = $xpath->query('//text()');

Where can I found some official php/xpath docs that explain it ?


Where can I found some official php/xpath docs that explain it ?

The notation:

text()

is a node test as defined in the W3C XPath 1.0 specification, which is the only official XPath 1.0 definition.

In particular, the spec says:

"The node test text() is true for any text node".

And a "text node" is one of the seven different kinds of nodes in the XPath data model.


The official spec is found in the official w3 site. text() is documented here. In short, a text node is any node which contains text, as distinct from, say, those that contain scripts or CSS.


This set of XPath examples might help. They're using Ruby, because that's my favorite language, but the important thing is the output.

require 'nokogiri' # load the parser

xml1 =<<EOT
<xml><node1>test1 text</node1></xml>
EOT

doc = Nokogiri::XML(xml1)      # parse the first XML sample
nodes = doc.search('//text()') # find the text() nodes

# inspect the nodes...
nodes # => [#<Nokogiri::XML::Text:0x8054a508 "test1 text">]

# display the nodes' content as text
puts nodes
# >> test1 text

xml2 =<<EOT
<xml>
  <node1>test1 text</node1>
</xml>
EOT

doc = Nokogiri::XML(xml2)
nodes = doc.search('//text()') 
nodes # => [#<Nokogiri::XML::Text:0x80549608 "\n  ">, #<Nokogiri::XML::Text:0x8054952c "test1 text">, #<Nokogiri::XML::Text:0x805493b0 "\n">]
puts nodes
# >> 
# >>   
# >> test1 text
# >> 

html =<<EOT
<html>
  <head>
    <script type="text/javascript"><!-- javascript --></script>
    <style type="text/css"><!-- style sheet --></style>
  </head>
  <body>
    text
    <p>p tag</p>
  </body>
</html>
EOT

doc = Nokogiri::HTML(html)
nodes = doc.search('//text()') 
nodes # => [#<Nokogiri::XML::CDATA:0x80548834 "<!-- javascript -->">, #<Nokogiri::XML::CDATA:0x80548730 "<!-- style sheet -->">, #<Nokogiri::XML::Text:0x805485a0 "\n    text\n    ">, #<Nokogiri::XML::Text:0x80548488 "p tag">, #<Nokogiri::XML::Text:0x8054830c "\n  ">]
puts nodes
# >> <!-- javascript -->
# >> <!-- style sheet -->
# >> 
# >>     text
# >>     
# >> p tag
# >> 
# >>   

If you compare the inspection and the printed output to the parsed XML and HTML, you can see a corresponding text node for each line of tags and of text between tags - sometimes they're empty, i.e., "\n" only. In other words, text nodes are the carriage returns forcing separate lines when tags are on separate lines, and they're the text that occurs between tags. That applys to XML and HTML.

You can see that the CSS and Javascript that would occur inside the corresponding <style> and <script> tags would also be text() nodes. They're treated as CDATA, which is a way to define the type of text, but for our purposes it's still text.


This is a good xpath 1.0 tutorial

http://www.zvon.org/xxl/XPathTutorial/General/examples.html

0

精彩评论

暂无评论...
验证码 换一张
取 消