Looking for some opinions on this and trying to start pushing forward with a solid design to my next project. Imagine 5,000,000 individual pictures with accompanying lo-开发者_如何学Cres preview images; each set of pictures belongs to a gallery, which belongs to a person. There are X number of people. A specialized version of Flickr, so to speak.
These are housed on a remote host with a web interface for viewing these pictures. There is also a desktop application to go along with it where you can upload pictures automatically to this remote host and enter the gallery details. The desktop app has preview images and information about each gallery and person. A desktop app that syncs with Flickr, so to speak.
I need to decide on two things: backend storage for the remote host and local storage for the desktop app. This is targeted for a Windows environment, so I was thinking that SQL Server Express would be a nice fit, but this project has grown quite a bit and that may only work for the desktop end.
The remote (web) server can be Windows or Linux, PHP or .Net -- I don't care as long as the technology fits. The question is how best to store all that data on the web server so that it can be easily indexed, quickly accessed, and, most importantly, easily backed up and restored in the event of a disaster. I am not worried about server configuration or disk space at this time, as long as the database solution supports something of a cloud computing scenario.
I'm thinking a No-SQL backend makes the most sense, storing the photos, galleries, and users as 'articles' rather than 'rows'. No-SQL seems more capable of growing with a cloud. On the flipside, Flickr has been advertised as using MySQL ...
Perhaps this is a more existential question that a real coding question, but I know of no better group to ask!
Having managed a stock photo site with well over 5 million photos, I can say that MySQL is certainly a viable option. Backup is easy if you use replication. Just stop a slave, copy it and then start it backup.
MySQL full text search isn't very good and can be slow. So you may want to look into a Lucene based engine like Solr. Elastic search is also a good option for scalability.
If you are unsure of your data structure, then something like MongoDB may be a good solution. But Mongo has limits on aggregation (~10,000 records), so keep that in mind. But it is one of the easiest to setup.
Anything can work in the cloud, so I think that requirement is a moot point. You can setup anything you want on EC2.
精彩评论