I have java application that process such k开发者_Python百科ind of data:
class MyData
{
Date date;
double one;
double two;
String comment;
}
All data are stored in csv format on hard disk, maximum size of such data sequence is ~ 150 mb, and for this moment I just load it fully to memory and work with it.
Now I have the task to increase maximum data sequence for hundreds of gigabyte. guess I need to use DB, but I did not work with them before.
My questions:
- Which DB better to choose for my reasons(there will be only 1 table with data as abowe) ?
- Which library better to use to connect Java <-> DB
- I guess there will be used something like cursor?!? if so, is there any cursor realization with good record caching for fast access?
Any other tips&tricks about java <-> DB are welcome!
Your question is pretty unspecific. There isn't a best of breed - it depends on how much money you have and what kind of hardware.
Since your mapping between Java and the DB is pretty simple, JDBC should be enough. JDBC will create a cursor for you as necessary; lost loop over the rows in the ResultSet
. Depending on the database, you may need to configure it to use cursors, though.
Since you mention "hundreds of gigabytes", that rules out most of the "simple" databases. If you have money, try Oracle. If you don't have money, try MySQL or Postgres.
You can also try JavaDB (also known as Derby). But I'm not sure the performance will be what you need.
Note that they all have their quirks and "features", so expect to spend a couple of weeks to find your way with them.
Depends entirely on what you will be doing with the data. Do you need to index it to retrieve specific records, or are you stream processing the entire data set to generate some statistics (for example)? Does the database need to be accessed concurrently by multiple clients/processes?
Don't rush immediately towards SQL/JDBC, relational databases are powerful, but they add a lot of complexity and are often entirely unnecessary for the task at hand.
Again, depending on what you actually need to do, something like BerkeleyDB may fit the bill, or you may just need a more compact binary message format: check out Protocol Buffers and Kryo.
If you really need to scale things up, look at Hadoop/HDFS for distributed processing (but that's getting rather complicated).
Oh, and generally speaking, JavaDB/Derby tends to suck somewhat.
I would recommend JavaDB. I have used it in a Point of Sale system and it works very good. It is very easy to integrate into your Java Application, and you can integrate it to the same .jar
file if you want.
Using Java DB in Desktop Applications may be a useful article. You will use JDBC for interfacing the database from Java, this makes it easy to switch to another database if you don't want to use JavaDB.
You'll want to evaluate several databases (you can get trials of just about any of them if they're not open source/free already). I'd recommend trying Oracle, Mysql/Postgres and with the size of your data (and its lack of apparent complexity) you might want to consider a datagrid as well (gridgain or similar).
Definitely prototype though.
I'd just like to add that the "fastest" database is not necessarily the best.
You also need to take into account:
- reliability,
- software license cost,
- ease of use,
- ease of administration,
- availability of support,
- and so on.
精彩评论