开发者

Ruby - Writing Hpricot data to a file

开发者 https://www.devze.com 2023-01-01 19:10 出处:网络
I am currently doing some XML parsing and I\'ve chosen to use开发者_如何学编程 Hpricot because of it\'s ease of use and syntax, however I am running into some problems. I need to write a piece of XML

I am currently doing some XML parsing and I've chosen to use开发者_如何学编程 Hpricot because of it's ease of use and syntax, however I am running into some problems. I need to write a piece of XML data that I have found out to another file. However, when I do this the format is not preserved. For example, if the content should look like this:

<dict>
  <key>item1</key><value>12345</value>
  <key>item2</key><value>67890</value>
  <key>item3</key><value>23456</value>
</dict>

And assuming that there are many entries like this in the document. I am iterating through the 'dict' items by using

hpricot_element = Hpricot(xml_document_body)
f = File.new('some_new_file.xml')
(hpricot_element/:dict).each { |dict| f.write( dict.to_original_html ) }

After using the above code, I would expect that the output look like the following exactly like the XML shown above. However to my surprise, the output of the file looks more like this:

<dict>\n", "    <key>item1</key><value>12345</value>\n", "    <key>item2</key><value>67890</value>\n", "    <key>item3</key><value>23456</value\n", "  </dict>

I've tried splitting at the "\n" characters and writing to the file one line at a time, but that didn't seem to work either as it did not recognize the "\n" characters. Any help is greatly appreciated. It might be a very simple solution, but I am having troubling finding it. Thanks!


hpricot_element = Hpricot::XML(xml_document_body)

File.open('some_new_file.xml', 'w') {|f| f.write xml_document_body }

Don't use an an xml parser if you want the original xml to be written. It is unnecessary. You should still use one if you want to further process the data, though.

Also, for XML, you should be using Hpricot::XML instead of just Hpricot.


My solution was to just replace the literal '\n' characters with line breaks and remove the extra punctuation by simply adding two gsubs that looked like the following:

f.write( dict.to_original_html.gsub('\n', "\n").gsub('" ,"', '') )


I don't know why I didn't see this before. Like I said, it might be an easy answer that I wasn't seeing and that's exactly how it turned out. Thanks for all the answers!

0

精彩评论

暂无评论...
验证码 换一张
取 消