Friends,
We will be undertaking a knowledge preservation project for scanning more than 1 million books. We need some suggestions on implementing database for storing and retrieving metadata as well as use it for tracking the scanning status of each object (book)
Can you guys suggest should we go for SQL or NoSQL (The metadata could vary from project to project say this project 开发者_开发知识库could have 15 fields)
We are thinking something based on Lucene/Solr or some Scalable RDF database
Any open source solution where we have the ability to define custom metadata fields and store information with a search feature?
Disclaimer: Never attempted this type of project
I have seen very good performance from MSSQL server's "Filestream" type. It uses the NTFS file APIs for storing binary data, and keeps a pointer in the rows of your table.
If you have no structure on the metadata you could use XML, but if you do have a repeating structure shove it into relation data and then you can use indexing etc. to help you get your performance.
Filestream Type
A solution like this can be created using any database and some custom code, but is probably made easier by using a CMS (content management system). CMS solutions hide the details of the underlying database and allow you to work with a extendable set of metadata for describing your documents.
Which CMS systems you use will depend on your budget, in house expertise and your needs, amongst other factors. I have been using Alfresco (commercial open-source), partly because my company already decided on it, but if I were to do a low budget website I might consider the non-Enterprise version. Oh and Alfresco leverages Lucene for search.
If your needs are very basic then a database for the metadata, a filesystem for the images and some code for your server should be sufficient. Avoid trying to store images in the database, since from my experience this not what databases do best.
精彩评论