开发者

When is BIG, big enough for a database?

开发者 https://www.devze.com 2023-02-04 02:45 出处:网络
I\'m developing a Java application that has performance at its core. I have a list of some 40,000 \"final\" objects,

I'm developing a Java application that has performance at its core. I have a list of some 40,000 "final" objects, i.e., I have an initialization input data of 40,000 vectors. This data is unchanged throughout the program's run.

I am always preforming lookups against a single ID property to retrieve the proper vectors. Currently I am using a HashMap over a sub-sample of a 1,000 vectors, but I'm not sure it will scale to production.

When is 开发者_C百科BIG, actually big enough for a use of DB? One more thing, an SQLite DB is a viable option as no concurrency is involved, so I guess the "threshold" for db use, is perhaps lower.


I think you're asking whether a HashMap with 40,000 entries in will be okay. The answer is yes - unless you really don't have enough memory, that should be absolutely fine. If you're writing a performance-sensitive app, then putting a large amount of fast memory in the machine running the app is likely to be an efficient way of boosting performance anyway.

There won't be very much overhead for each HashMap entry, so if you've got enough space to store the objects themselves in memory, it's unlikely that the overhead of the map would cause a problem.

Is there any reason why you can't just test this with a reasonable amount of data?

If you really have no more requirements than:

  • Read data at start-up
  • Put data in a map by a single ID (no need for joins, queries against different fields, substring matches etc)
  • Fetch data from map

... then using a full-blown database would be a huge amount of overkill, IMO.


As long as you're loading the data set in a memory at the beginning of the program and keeping it in memory and you don't have any complex queries, some sort of serialization/deserialization seems to be more feasible than a full blown database.


You could start a DB with as little as 100 (or less). There is no general rule of when the amount of data is large enough to store in a database. It's more if you believe you should better store this data in a database, if this will give you any profit (performance boost, easier programming, more flexible options for your users).

When the benefits are greater than the cost of implementation put it in a database.


There is no set size for a Collection vs a Database. It high depends on what you want to do with the data. Size is less important.

You can have a Map with a billion entries.


There's no such thing as 'big enough for a database'. The question is whether there are enough advantages in using a database to overcome the costs.

Having said that, 40,000 isn't 'big' ;-) Unless the objects are huge or you have complex query requirements I would start with an in-memory implementation. But if you expect to scale this number up over time it might be better to use the database from the beginning.


One option that you might want to consider is the Oracle Berkeley DB Java Edition library. It's a simple JAR file that can read/write data to persistent storage. Because of it's small footprint and ease of use, it's used for applications running on small to very large data sets. It's designed to be linked into the application, so that it's embedded and doesn't require complex client/server installation or protocol stacks.

What's even better is that it's extremely scalable (which works well if you end up with larger data sets than you expect), is very fast, and supports both a Java Collections API and a Direct Persistence Layer API (POJO-like). So you can use it seamlessly with Java Collections.

Berkeley DB Java Edition was designed specifically with Java application developers in mind. It's designed to be simple to use, light weight in terms of resources required, but very fast, scalable and reliable.

You can find information more about Oracle Berkeley DB Java Edition here

Regards,

Dave

0

精彩评论

暂无评论...
验证码 换一张
取 消