My Website: A search intensive location based social network. So at a high level are components like we see on facebook - profiles, feeds etc. On a low level I am drilling down right to reservations at hotels across the world, restaurants etc. So lots of data, lots of searches, lots of analytic, lots of reads/writes.
Current Platform: 1 MySQL database, Php Codeingiter, 1 Dedicated hosting server. Website is geo-location so world over, support multi lang and localization, must be real time. Plan to add CDN once launched. This will change once i finalize the exact technologies to use.
Here are the list of concerned items:
1. Website searc开发者_JS百科hing: Photos / videos (name, description, people tagged in it), user defined tags, comments (like wall comments), posts, blogs, group, people searching by name/email.Mail searches: searching subject, email content, sender, attachments.
Storing Basic user/system values: User details, system details, schema, etc.
Storing & implementing Live feeds: Real time feeds based on user activities.
Storing & implementing Analytic: In house developed web analytic for system reporting + user analytic for business pages. This includes mixture of reports/graphs/metrics. So this will be a custom data warehouse.
Storing & implementing Relationships: Find, maintain and show users degree in relationships, common items between various degrees.
Handling API calls so businesses can send/receive data like hotel/restaurant owners, etc
QUESTION - Can anyone suggest: Database to use - type and which brand exactly (relational, document, key value, graph, etc), database engine to use if rdms (InnoDB may not work in all cases), add on servers/file systems/cache like memcached, etc? Should i go normalized or de-normalized if rdms. Or NOSQL all the way.
MySQL works for some parts, memcache works for some, lucene works for some parts, some parts like inbox may require a document database, relationships may require a graph database but I am not sure which one works exactly for which of the 7 items above and if i can use the same platforms/technologies for most of the above items. My only requirement is open source so it can be free to use and work with PHP. I don't want to implement a separate database/set of technologies to support each of the above 7 requirements. Ofcourse being a social network, performance and scalability are important too.
If you have the bucks then Oracle will support most of your requirments. which really come down to standard RDBMS, plus CLOBs plus full text search.
MS SQLserver will also support these features but you are limited to Windows host.
If you are doing this with open source I would seriously look at PostGres as MySql's future looks uncertain now its owned by the world largest comercial database vendor.
Well, FourSquare is doing most of this with MongoDB, so it must have something going for it.
I don't want to start any holy wars here (though I guess it may not sound like it), but don't use MySQL, just... don't. Besides, these days it's looking more likely that Oracle is trying to kill it. Oracle itself would be a giant waste of money for something like this.
If you want to stay with a relational model, take a look at VoltDB, that's been making some noises as a SQL db that's actually horizontally scalable.
Personally I would start with a combination of Mongo, Lucene, and Hadoop/HBase for the data crunching (analytics, relationships, etc). But really, it would just be an excuse to play with shiny new toys, I don't claim to have much experience with these.
I would seriously rethink PHP as well, but here I go again with the holy wars.
First if you think the site will grow to anything like other successful sites you want to scale horizontally, so you'll need a distributed solution. Which means some sort of NoSQL solution. But you don't have to choose a single NoSQL solution, more and more you're seeing whats termed a polyglot approach - multiple db's to handle specific aspects. Seem too complicated? Probably not compared to trying to scale an ill-fitting technology into your architecture. So store the objects in Cassandra or Mongodb, which provides excellent scale and performance. Then feed the relationship data into a distributed graph database to handle the network links. You'll have a nice blend of technologies that will be more scalable than a SQL database. But you'll need to review the technical requirements of the various technologies on your own, too many decisions to be made to make a product recommendation.
精彩评论