I'm looking for a efficient way to store many key->value pairs on disc for persistence, preferably with some caching.
The features needed are to either add to the value (concatenate) for a given key or to let the model be key -> list of values, both options are fine. The value-part is typically a binary document.
I will not have too much use of clustering, redundancy etc in this scenario.
Language-wise we're using java and we are experienced in classic databases (Oracle, MySQL and more).
I see a couple of obvious scenarios and would like advice on what is fastest in terms of stores (and retrievals) per second:
1) Store the data in classic db-tables by standard inserts.
2) Do it yourself using a file sys开发者_开发百科tem tree to spread to many files, one or several per key.
3) Use some well known tuple-storage. Some obvious candidates are: 3a) Berkeley db java edition 3b) Modern NoSQL-solutions like cassandra and similar
Personally I like the Berkely DB JE for my task.
To summarize my questions:
Does Berkely seem like a sensible choice given the above?
What kind of speed can I expect for some operations, like updates (insert, addition of new value for a key) and retrievals given key?
You could also give a try to Chronicle Map or JetBrains Xodus which are both Java embeddable key-value stores much faster than Berkeley DB JE (if you are really looking for speed). Chronicle Map provides an easy-to-use java.util.Map
interface.
BerkeleyDB sounds sensible. Cassandra would also be sensible but perhaps is overkill if you don't need redundancy, clustering etc.
That said, a single Cassandra node can handle 20k writes per second (provided that you use multiple clients to exploit the high concurrency within Cassandra) on relatively modest hardware.
FWIW, I'm using Ehcache with completely satisfactory performance; I've never tried Berkeley DB.
Berkeley DB JE should work just fine for the use case that you describe. Performance will vary, largely depending on how many I/Os are required per operation (and the corollary -- how big is the available cache) and on the durability constraints that you define for your write transactions (ie. does a commit transaction have to write all the way to the disk or not)?
Generally speaking, we typically see 50-100K reads per second and 5-12K writes per second on commodity hardware with BDB JE. Obviously, YMMV.
Performance tuning and throughput questions about BDB JE are best asked on the Berkeley DB JE forum, where there is an active community of BDB JE application developers on hand to help out. There are several useful performance tuning recommendations in the BDB JE FAQ which may also come in handy.
Best of luck with your implementation. Please let us know if we can help.
Regards,
Dave -- Product Manager for Berkeley DB
精彩评论