开发者

Can anyone explain scenarios where Project Voldermort or similar key value stores are useful?

开发者 https://www.devze.com 2022-12-08 19:40 出处:网络
I can see myself using Project Voldermo开发者_JS百科rt to cache results from a Traditional RDBMS query. But in this case, it provides almost no major advantage over other (Java) caching system such as

I can see myself using Project Voldermo开发者_JS百科rt to cache results from a Traditional RDBMS query. But in this case, it provides almost no major advantage over other (Java) caching system such as EHcache Jcache etc.

Where else could I use Project Voldermort or similar Key Value stores ? How are you using this in your business applications ?


One approach to improving the speed of your database is to denormalize. Take this MySQL example:

CREATE TABLE `users` (
    `user_id` INT NOT NULL AUTO_INCREMENT,
    … -- Additional user data
    PRIMARY KEY (`user_id`)
);


CREATE TABLE `roles` (
    `role_id` INT NOT NULL AUTO_INCREMENT,
    `name` VARCHAR(64),
    PRIMARY KEY (`role_id`)
);


CREATE TABLE `users_roles` (
    `user_id` INT NOT NULL,
    `role_id` INT NOT NULL,
    PRIMARY KEY (`user_id`, `role_id`)
);

Neat, tidy, normalized. But if you want to get users and their roles, the query is complex:

SELECT u.*, r.*
  FROM `users` u
  LEFT JOIN `user_roles` ur ON u.`user_id` = ur.`user_id`
  JOIN `roles` r ON ur.`role_id` = r.`role_id`;

If you denormalized this, it might look something like:

CREATE TABLE `users` (
    `user_id` INT NOT NULL AUTO_INCREMENT,
    `role` VARCHAR(64),
    … -- Additional user data
    PRIMARY KEY (`user_id`)
);

And the equivalent query would be:

SELECT * FROM `users`;

This improves some of the performance characteristics of your queries:

  1. Because the result you want is already in a table, you don't have to perform read-side calculations. e.g. if you wanted to see the number of users with a given role, you'd need a GROUP BY and COUNT. If it were denormalized, you would store it in a different table devoted to holding roles and counts of users who have that role.
  2. The data you want is in the same place, and hopefully in the same place on disk. Rather than requiring many random seeks, you can do one to a few sequential reads.

NoSQL DBs are highly optimized for these cases, where you want to access a mostly-static sequential dataset. At that point, it's just moving bytes from disk to the network. Less work, less overhead, more speed. Despite how simple this sounds, it's possible to model your data and application so it feels natural.

The trade-off for this performance is write load, disk space, and some app complexity. Denormalizing your data means more copies, which means more disk space and write load. Essentially, you have one dataset per query. Because you shift the burden of those computations to write-time instead of read-time, you really need some sort of asynchronous mechanism to do that, hence some app complexity.

And because you have to store more copies, you have to perform more writes. This is why you can't practically replicate this kind of architecture with a SQL database – it's extremely difficult to scale writes.

In my experience, the trade-off is well worth it for a large-scale application. If you'd like to read a bit more about a practical application of Cassandra, I wrote this piece a few months ago, and you might find it helpful.


Project Voldermort is part of the NoSQL movement. Trends in computer architectures are pressing databases in a direction that requires horizontal scalability. NOSQL attempts to address this requirement.

Among the claimed benefits of such Key/Value stores is the ability to blow through enormous amounts of data without the overhead of a traditional RDBMS.

http://www.computerworld.com/s/article/9135086/No_to_SQL_Anti_database_movement_gains_steam_

0

精彩评论

暂无评论...
验证码 换一张
取 消