I have a quick question about best practices and especially expected performance for the following scenario:
If I want to query data from multiple servers that contain schematically identital sql databases, would having each server provide a web method that a single client application can consume be an appropriate (and relatively fast) solution?
The data just needs to be consolidated on the client end, where several web methods would have to be consumed serially (or in parallel?) to provide the data to the client. Each server would also be implementing Entity Framework as an ORM.
Performance is my main concern here, woul开发者_开发知识库d it turn out excessively slow as we start to scale up to more and more servers?
The problem is not performance, is reliability. As the number of services you need to query to return a response to the client increases, the reliability decreases. Say you have 99% availability for a database (less than 4 days of total downtime for maintenance, patches and upgrades through an entire year). If you need to query 5 databases your client sees and actual availability of only 95%, almost 18 days a year your site appears down. At 10 databases the availability is 90% (35 days down) and 50 servers plummet to 60% which means your site is unusable.
This is why the driving force in this scale out scenarios is the reliability, which can be achieved only by decoupling the databases. The usual trick is to implement an asynchronous messaging buss for the databases to communicate, and have each request to the site connect only on its local shard never ever querying more than one database per request.
For a more detailed explanation how this is done at massive scale, see this presentation how MySpace uses a SQL Server based messaging buss to achieve scalability.
This SIGMOD 2009 Keynote presents in turn how Facebook achieves similar results: Building Facebook: Performance at Massive Scale, using memcached and MySQL partitioning.
If your schematically identical data is residing in different databases, then have you looked into table partitioning and storing all your data in one database? That might help.
In your current scenarios I would suggest that you fetch the data from different database servers using ORM/ADO.Net and then logically merge them in your application.
In SQL Server 2005 there are ways to expose data through web service but I would not suggest that as web service itself will give you performance penalty because you are crossing application boundary.
精彩评论