开发者

Is SQLite a good choice for a large, read only database for research?

开发者 https://www.devze.com 2023-02-05 12:07 出处:网络
I have a large number of records (say around 10 to 100 million), which I want to be able to query. This is a research project, the database is going to be mostly read only, and I only need one conne

I have a large number of records (say around 10 to 100 million), which I want to be able to query.

This is a research project, the database is going to be mostly read only, and I only need one connection at a time. I would like the queries to be reaso开发者_开发问答nably fast.

Is SQLite a reasonable choice for this purpose?


My experience with SQLite is that it may be quite slow on large recordsets, depending on how you structure your queries. If your data is de-normalized and you can get by querying a single table against its primary key then it's acceptably fast, but if your data is fully normalized and your queries involve several joins then it can be much slower than a client-server database.

SQLite's principal advantage is its small size and single file nature that make it easy to distribute embedded in an app. As that doesn't seem to be a requirement for you though, I think you'd be better off going with something else. SQL Server Express is good if you're using Windows, MySQL or Postgres otherwise would be a good choice.


As pointed out in the previous posts, SQLite is a great SQL library, but it can run out of gas when the data set gets very large. Berkeley DB recently introduced a SQL API which is completely SQLite compatible. It was added to Berkeley DB in order to provide the best of both worlds to SQLite users -- the ubiquity, simplicity and ease of use of SQLite with the concurrency, scalability and reliability of Berkeley DB.

The Berkeley DB SQL API was designed to be a drop-in replacement for SQLite applications, especially those that specifically need the Berkeley DB features and scalability that isn't available in native SQLite. You can read more about it in the Berkeley DB SQL API documentation.

Disclaimer: I'm one of the Product Managers for Berkeley DB, so I'm a little biased. But your use case is one of the reasons that we worked with Dr. Hipp and the SQLite developers in order to combine the SQLite API with the Berkeley DB storage manager. It allows SQLite application developers to take their applications into new areas with added capabilities, while remaining compatible with their existing implementation.

Please let us know if you have any questions or if there is anything that we can do to help. You can find an active community of Berkeley DB developers on the OTN Forums.

Best of luck with your project.

Regards,

Dave


SQLite is not particularly fast when getting into the millions of entries. Results will vary according to what you put in there, schema, number of columns, indexes.

The advantage (especially in your case) of SQLite is that it is so light that trying it with some data would probably be worth the time and effort. It's very much straightforward and its ideal use case is indeed for single user access.

I'd say try and build it up with a representative amount of data (you can do an import from a CSV file from the command line, or use one of the many wrappers available out there). If the speed is not satisfactory, you might have to switch to something with more power, but admittedly a bit more setup too, like MySQL.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号