So the application we've got calls the API's of all the major carriers (UPS, FedEx, etc) for tracking data.
We save the most recent version of the XML feed we get from them in a TEXT field in a table in our database.
We really hardly ever (read, never so far) access that data, but have it "just in case."
It adds quite a bit of additional weight to the database. Right now a 200,000 row table is coming in at around 500MB...the large majority开发者_运维技巧 of which is compromised of all that XML data.
So is there a more efficient way to store all that XML data? I had thought about saving them as actual text/xml files, but we update the data every couple of hours, so wasn't sure if it would make sense to do that.
Assuming it's data there's no particular reason not to keep it in your database (unless it's impeding your backups). But it would be a good idea to keep it in a separate table from the actual data that you do need to read on a regular basis — just the XML, a FK back to the original table, and possibly an autonumbered PK column.
It has been my experience that the biggest trouble with TEXT/BLOB columns that are consistently large, is that people are not careful to prevent reading them when scanning many rows. On MyISAM, this will waste your VFS cache, and on InnoDB, it will waste your InnoDB buffer pool.
A secondary problem is that as tables get bigger, they become harder to maintain.. adding a column or index can rebuild the whole table, and a 500MB table rebuilds a lot slower than a 5MB table.
I've had good success moving things like this off into offline key/value storage such as MogileFS, and/or TokyoTyrant.
If you don't need to be crazy scalable, or you must value transactional consistency over performance, then simply moving this column into another table with a 1:1 relationship with the original table will at least require a join to blow out the buffer pool, and allow you to maintain the original table w/o having to tip-toe around the 500MB gorilla.
if its really unused, try:
/dev/null
I don't know what kind of data these XML streams contain, but maybe you can parse it and store only the pertinent info in a table or set of tables that way you can eliminate some of the XML's bloat.
Learn about OLAP techniques and data warehouses. They are probably what are you looking for.
As a DATAbase is designed to store DATA this seems to be the logical place for it. A couple of suggestions:
Rather than storing it in a seperate table is to use a seperate database. If the information isn't critical
Have a look athe the compress and uncompress functions as this could reduce the size of the verbose XML.
I worked on one project where we split data between the database and file system. After this experience I vowed never again. Backups and maintenance of various production/test/dev environments turned into a nightmare.
Why not store them to text files, and them keep a simple path (or relative path) in the database?
We used to do something similar in the seismic industry where the bulk of the data were big arrays of floating point numbers. Much more efficient to store these as files on disk (or tape), and then only keep trace meta data (position/etc) in a RDBMS-like database (I at about the time they were porting to Oracle!). Even with the old system, the field data was always on disk and easily accessible - it was used more frequently than the array data (although, unlike in your case, this was most definitely essential!
精彩评论