We are thinking of centralizing cfg information and looks like zookeeper is a good choice. We are also interested in sharding and have a scheme. In the 开发者_运维知识库poweredBy[1] saw that Rackspace and Yahoo is using Zookeeper for sharding. Would appreciate pointers and details.
[1] https://cwiki.apache.org/confluence/display/ZOOKEEPER/PoweredBy
Solr is going to use Zookeeper for sharding. ZooKeeper Integeration design doc might be interesting for you.
I can think of two things that they could be referencing.
They could be referencing using the built in ensemble features. Using those you can actually setup a group management protocol for your service. As you add more servers to the ensemble you effectively shard your pool out to greater numbers. The data between the members of the ensemble is sync'd between the member servers. This is especially useful for applications that shard out the same data set to multiple read pools - such as index servers, search servers, read cache's, etc.
They could be using ZooKeeper for configuration management. Let's now assume that your application may have thousands of clients that all need to update their config files at the same time. Let's say that your application now accesses a data storage layer of 50 servers - but that pool needs to be sharded out to 200. You can setup a slaving relationship to perform the 1 to 4 slave relationship. ZooKeeper could then be used to update that config file and in essence change every config file within a second of each other.
You should also take a look at how HBase uses Zookeeper; specifically to maintain information about regions. This would be analogous to using ZK to maintain DB sharding info.
For managing the lookup table . Since this lookup table have to be strong consistent, this is where zookeeper comes into picture.
精彩评论