So there's this new 开发者_JAVA百科cool thing, these NoSQL-databases. And so there's my data: Rows of rows of rows of meteorological data: Values, representing certain measurements at a certain station (Identified by a WMO number, not coordinates), at a certain time.
Not every station measures every parameter, not every parameter is measured all the time.
I store this data (30 years worth of hourly values, resulting in ~1 billion values) currently in MySQL. The continous growth and the forseeable addition of even more data give me a little headache.
Reading about the document based NoSQL systems which seem to scale rather easily, I was wondering if NoSQL is a viable data storage concept for meteorological data too. Do you have any experience with this?
Update: Forgot about typical queries: Most of the queries need data in the temporal axis: I.e. give me the temperatures of station 066310 from 01.01.2010 00:00 to 01.03.2010 00:00.
Or: give me the most recent values of all parameters of a particular station.
NoSQL could be a fit when your data structure is quite simple (for example a simple key-value store) / predictable and you have no need for relational integrity or a need for ad-hoc and/or advanced querying.
What you win in easy scalability you might lose in flexibility and consistency though.
The biggest problem would be to have an easy means for composing complex queries over your data. I would say meterological data is not the best candidate for NoSQL.
I personally prefer PostgreSQL over MySQL and find it very scalable (even with millions or even billions of rows) when setup correctly.
I think you should try with a full-featured and mature DBMS, before giving up with SQL.
See for instance:
http://www.yafla.com/dforbes/Getting_Real_about_NoSQL_and_the_SQL_Performance_Lie/
http://www.yafla.com/dforbes/The_Impact_of_SSDs_on_Database_Performance_and_the_Performance_Paradox_of_Data_Explodification/
I find it hard to create a coherent answer right now, but here goes.
- Your data would fit without problem in a "nosql" datastore such as Cassandra (and many more probably)
- You would benefit from the schema-less design of many "nosql" solutions (seeing as not all columns (to use a MySQL term) are present all the time)
- The time based queries would be no problem in Cassandra (check out TimeUUID based keys)
- You don't seem to be taking advantage of the relational part of MySQL, so you wouldn't be hurt that much when losing it
- Although you might be just fine with MySQL, since you're really not describing the kind of problems, are you really having any? (Just being interested is totally cool)
- Things like indexes and search are things you would have to implement manually in many nosql datastore, if this scares you perhaps stick with sql.
Thanks for listening ;)
精彩评论