As Hypertable and HBase seem to be the two major open source BigTable implementations开发者_Go百科, what are the major pros and cons between these two databases?
In addition, what are the major pros and cons between BigTable and SQL RDBMSes, and what significant differences can I expect between writing a project with a traditional RDBMS like Postgres and Hypertable?
At the risk of broadening your second question more than I should (I've never played with BigTable, but I've toyed with MongoDB and CouchDB)...
The most important difference, in so far as I've understood it anyway, is that RDBMS all use a row-based store, whereas NoSQL engines use a column-based store. The pros and cons mostly derives from this point.
http://en.wikipedia.org/wiki/Column-oriented_DBMS
The major consideration that I tend to keep in mind is ACID compliance: a NoSQL engine is eventually consistent, rather than always consistent. Think of it like a storage that behaves like a website's cache: the latter is normally valid and consistent, but occasionally slightly outdated/inconsistent.
There's no right or wrong here: for some use-cases (e.g. a search engine, a blog), slightly inconsistent is a very acceptable option; for others (e.g. a bank, a billing system) it is not. (I tend to work on stuff that needs atomicity.)
Then, there are plenty of performance considerations that break down to implementation details.
An immediate consequence of striving for eventual consistency is that integrity checks and so forth are typically done in the app rather than the data store (i.e. there are no triggers or stored procedures to speak of). Your data store ends up with less work to do, which results in obvious performance benefits.
A column-based store means that if you update a single column from your document, you only invalidate that column. A row-based store, by contrast, invalidates the entire row. Depending on how you typically update your data (i.e. just a few columns vs most of them), either approach can add up.
A flip side of a column-based store is that it makes joins trickier (from an implementation standpoint). In overly simplistic terms, think of it as having an EAV table per column; this works fine for a few tables. It's a different story if you need a big report that requires a dozen joins on sales or stocks (which a good RDBMS will handle just fine).
A more experienced user will hopefully chime in on NoSQL sharding and replication. On this I'd only feel comfortable enough to point out that Postgres has built-in replication features since 9.0 and is quite good at dealing with queries that span multiple partitions.
Anyway... To cut a very long story short: unless you already know that you'll need to instantly scale to petabytes and gazillions of requests in multitudes of data centers in your next project, I think the only consideration that you should have in mind when picking an SQL or NoSQL implementation is whether you absolutely need ACID compliance or not.
Lastly, if your main interest lies in trying a new toy, consider trying a graph-oriented database instead. These potentially combine the benefits of row- and column-based stores.
精彩评论