I am trying to build a distributed task queue, and I am wondering if there is any data store, which has some or all of the following properties. I am looking to have a completely decentralized, multinode/multi-master self replicating datastore cluster to avoid any single point of failure.
Essential
- Supports Py开发者_运维技巧thon pickled object as Value.
- Persistent.
More, the better, In decreasing order of importance (I do not expect any datastore to meet all the criteria. :-))
- Distributed.
- Synchronous Replication across multiple nodes supported.
- Runs/Can run on multiple nodes, in multi-master configuration.
- Datastore cluster exposed as a single server.
- Round-robin access to/selection of a node for read/write action.
- Decent python client.
- Support for Atomicity in get/put and replication.
- Automatic failover
- Decent documentation and/or Active/helpful community
- Significantly mature
- Decent read/write performance
Any suggestions would be much appreciated.
Cassandra (open-sourced by facebook) has pretty much all of these properties. There are several Python clients, including pycassa.
Edited to add:
Cassandra is fully distributed, multi-node P2P, with tunable consistency levels (i.e. your replication can be synchronous or asynchronous or a mixture of both). Clients can connect to any server. Failover is automatic, and new servers can be added on-the-fly for load balancing. Cassandra is in production use by companies such as Facebook. There is an O'Reilly book. Write performance is extremely high, read performance is also high.
精彩评论