I need to parse for an XML style sheet:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="/templates/xslt/inspections/开发者_StackOverflowdisclaimer_en.xsl"?>
Using Nokogiri I tried:
doc.search("?xml-stylesheet").first['href']
but I get the error:
`on_error': unexpected '?' after '' (Nokogiri::CSS::SyntaxError)
Nokogiri cannot search for tags that are XML processing instructions. You may access them like this:
doc.children[0]
This is not an XML element; this is an XML "Processing Instruction". That is why you could not find it with your query. To find it you want:
# Find the first xml-stylesheet PI
xss = doc.at_xpath('//processing-instruction("xml-stylesheet")')
# Find every xml-stylesheet PI
xsss = doc.xpath('//processing-instruction("xml-stylesheet")')
Seen in action:
require 'nokogiri'
xml = <<ENDXML
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="/templates/disclaimer_en.xsl"?>
<root>Hi Mom!</root>
ENDXML
doc = Nokogiri.XML(xml)
xss = doc.at_xpath('//processing-instruction("xml-stylesheet")')
puts xss.name #=> xml-stylesheet
puts xss.content #=> type="text/xsl" href="/templates/disclaimer_en.xsl"
Since a Processing Instruction is not an Element, it does not have attributes; you cannot, for example, ask for xss['type']
or xss['href']
; you will need to parse the content as an element if you wish this. One way to do this is:
class Nokogiri::XML::ProcessingInstruction
def to_element
document.parse("<#{name} #{content}/>")
end
end
p xss.to_element['href'] #=> "/templates/disclaimer_en.xsl"
Note that there exists a bug in Nokogiri or libxml2 which will cause the XML Declaration to appear in the document as a Processing Instruction if there is at least one character (can be a space) before <?xml
. This is why in the above we search specifically for processing instructions with the name xml-stylesheet
.
Edit: The XPath expression processing-instruction()[name()="foo"]
is equivalent to the expression processing-instruction("foo")
. As described in the XPath 1.0 spec:
The
processing-instruction()
test may have an argument that is Literal; in this case, it is true for any processing instruction that has a name equal to the value of the Literal.
I've edited the answer above to use the shorter format.
精彩评论