Parse data from multiple XML files and output to csv file_问答_开发者

Parse data from multiple XML files and output to csv file

开发者 https://www.devze.com 2023-02-19 06:37 出处：网络

I\'ve got a dozen XML files which contain the results of some wcat web performance tests. Within each XML file there is a data node that contains the names of each page requested and the average time

I've got a dozen XML files which contain the results of some wcat web performance tests. Within each XML file there is a data node that contains the names of each page requested and the average time it took to load it. I want to extract that information from each XML file and output it to a csv file so I can create a nice pretty graph in excel.

I could do the task in my main working language of C# but in an attempt to improve my scripting skills I'd like to try and do it using unix/cygwin commands or a scripting language such as Ruby.

The format of the XML file is:

<report name="wcat" version="6.3.1" level="1" top="100">
 <section name="header" key="90000">
  ... lots of other XML junk...
  <item>
   <data name="reportt" >Request Name I</data>
   ...
   <data name="avgttlb" >628</data>
  </item>
  <item>
   <data name="reportt" >Request Name II</data>
   ...
   <data name="avgttlb" >793</data>
  </item>
  ... lots of other XML junk...
 </section
</report>

And the csv output I need is:

Request,File 1,File 2,...,File 12开发者_JAVA技巧
Request Name I,628,123,...,789
Request Name II,793,456,...,987

Are there any good cygwin command line utilities that could parse the XML? Or failing that is there a nice way to do it in Ruby?

What you're describing could be done in XSLT, which supports text output method, multiple input files (using the document() function), and of course templates.

I know some people find XSLT gross, but I use it all the time for this kind of thing and rather like it. Plus it's pretty much platform-independent.

Ruby has a nice parser called Nokogiri, that I really like. It supports both XML and HTML, DOM and SAX, and can build XML if that's your fancy. It's built on libxml2.

#!/usr/bin/env ruby -w

xml = <<END_XML
<report name="wcat" version="6.3.1" level="1" top="100">
<section name="header" key="90000">
  <item>
    <data name="reportt" >Request Name I</data>
    <data name="avgttlb" >628</data>
  </item>
  <item>
    <data name="reportt" >Request Name II</data>
    <data name="avgttlb" >793</data>
  </item>
  </section
</report>
END_XML

require 'nokogiri'
doc = Nokogiri::XML(xml)
content = doc.search('item').map { |i| 
  i.search('data').map { |d| d.text }
}

content.each do |c|
  puts c.join(',')
end

# >> Request Name I,628
# >> Request Name II,793

Notice that Nokogiri allows use of CSS accessors, which I'm using here, in addition to the standard XPath accessors. The actual parsing took the middle four lines.

Ruby's got a built-in CSV generator/parser, but for this quick 'n dirty example I didn't use it.

in python...

import elementTree.ElementTree
import csv

result = []
tree = elementTree.ElemenTree.parse('test.xml')
section = tree.getroot().find('section')
items = section.findall('item')
for item in items:
    records = item.findall('data')
    row = [rec.text for rec in records]
    result.append(row)

csv.writer(file('output.csv', 'w'))
csv.writerows(result)