Should REXML ignore identation or whitespacing?
I am debugging an issue with a simple HTML to Markdown convertor. For some reason it fails o开发者_StackOverflown
<blockquote><p>foo</p></blockquote>
But not on
<blockquote>
<p>foo</p>
</blockquote>
The reason is, that in the first case, type.children.first.value
is not set, in the latter case it is.
The original code can be found at link above, but a condensed snipped to show the problem is below:
require 'rexml/document'
include REXML
def parse_string(string)
doc = Document.new("<root>\n"+string+"\n</root>")
root = doc.root
root.elements.each do |element|
parse_element(element, :root)
end
end
def parse_element(element, parent)
@output = ''
# ...
@output << opening(element, parent)
#...
end
def opening(type, parent)
case type.name.to_sym
#...
when :blockquote
# remove leading newline
type.children.first.value = ""
"> "
end
end
#Parses just fine
puts parse_string("<blockquote>\n<p>foo</p>\n</blockquote>")
# Fails with undefined method `value=' for <p> ... </>:REXML::Element (NoMethodError)
puts parse_string("<blockquote><p>foo</p></blockquote>")
I am quite certain, this is due to some parameter that makes REXML require whitespacing and identation: why else would it parse the first XML different from the latter?
Can I force REXML to parse both the same? Or am I looking at a whole different kind of bug?
Try passing the option :ignore_whitespace_nodes=>:all to Document.new().
精彩评论