I wou开发者_开发技巧ld like to parse XML files into a MYSQL DB.
What is the most efficient and fastest way to do this on a LINUX system (Ubuntu) and the least resource intensive.
I have about 1GB worth of XML files that I need to parse every 15mins. Each XML is about 60KB.
I was thinking about using Shell or Perl and either build the parser myself or get some sort of XML tool.
I am open to any suggestions.
Well, the fastest XML ( non validating ) parser I have seen so far is VTD-XML. It can be used everywhere performance is paramount. To give some numbers, on a Core2 2.5 Ghz, VTD-XML outperforms DOM parsers by 5x~12x, delivering 150~250 MB/sec per core sustained throughput.
At that rate, 1 GB of XML can be parsed in 10 seconds. If the parse is successful you can random walk the in-memory data-structure, or use XPath to get data.
Given your requirements ( 1 GB XML input ) you have to take into account that VTD-XML will take 1.3~1.5 GB of your system RAM, because it has to build an in-memory data structure to access parsed data, plus the XML text itself.
VTD-XML library is available for C#, Java, C++, C, it takes some time to get used to, since it has some learning curve, but in the long term it may start paying back.
If available memory ain't enough, a stream parser ( SAX ) should be a more appropriate tool for the job.
Would it be a better option to insert the xml files into the mysql table and then extract the values using the XML functions of MySQL? You may refer to the link below:
http://rpbouman.blogspot.com/2006/03/importing-xml-data-into-mysql-using.html
The fastest way to get lots of data into MySQL is using LOAD DATA INFILE to load data from a file into a key-less table in PK order. Considering that you might very well be limited by DB performance on this, it may be worthwhile to pay the price of converting the XML to a CSV/whatever file for LOAD DATA INFILE and then slurping it into the DB in one go.
Cf. http://mysqldump.azundris.com/archives/94-LOAD-DATA-INFILE-and-mysqldump.html
精彩评论