Will caching data on server improve performance?_问答_开发者

I have webservice(WCF) and MembershipProvider/RoleProvider to check credentials.

When service called - various methods call prov开发者_运维百科iders to get user, get login name by Id, get Id by login name and so on. End result - When looking in Profiler - I can see lot of chat.

I can easily incorporate caching into MembershipProvider and RoleProvider's so it will cache user's and won't hit DB every time.

User list is not big. I don't think it will ever be more than 100-200.

On one hand - I know SQL Server does cache small tables and designed to take care of those selects. OTOH - I SEE it in profiler :) And memory on web server side will be occupied. Plus, lookups on web server still need to be done (CPU/Memory).

I guess I want to hear about your experience and should I even worry about this stuff? I placed tags "strategically" so hopefully both DBA and developers will give me some input :)

Absolutely, avoiding a round trip to the DB server pays off. Is not only the memory cache issue. Running a query, even a trivial one, is quite a complex process:

the connection has to be opened and authenticated. It gets amortized with connection pooling, true, but even a pooled connection still requires one extra roundtrip for sp_reset_connection at open time.
request has to be composed and sent to the server
a task needs to be allocated for the request and a worker has to pick up the task. workers are very precious resource in SQL Server, as there are so few of them.
SQL has to parse your query. At the very very best it can skip the parsing but still needs to hash the input text, see Dynamically created SQL vs Parameters in SQL Server
query has to be executed, locks acquired, pages in memory looked up. Locking is especially expensive because it may conflict with other operation and has to wait. Using snapshot isolation can help to some (large) extent.
the result has to be marshaled back to client.
client has to parse the response metadata.
client has to process the response.

An local in memory lookup will win most times. Even a remote cache lookup like memcached will win over a DB query, no matter how trivial the query. So why not always cache locally? Because of the one of the most hard problems in CS: cache coherency and invalidation. It is a problem that is way harder than you think it is right now, no matter how hard you think it is ;). You may look at SQL Server's own active cache invalidation solution, Query Notifications, which works pretty well for fairly static result sets. I have a LINQ integration with Query Notification project myself, LinqToCache.

Depends. What happens when you want to delete or disable a user's account and have the change take effect immediately?

You need to make sure that all modifications to user accounts are always reflected in the cache on all of your web servers. But ultimately, it is quite likely that avoiding the network IO (which is what would really slow you down), while not noticeable to an individual, would make a slight difference over hitting the DB each time.

Chances are though, it isn't worth it.

I have seen this strategy pay off tremendously even for small tables that are accessed frequently. In our case we distributed data to Express instances on each local web server, and the web apps would query their local copy instead of bothering the primary OLTP server, using network resources etc. As your app grows and you add more web servers, the benefit of taking this much read activity and load away from the read/write database just gets larger.

This was a custom solution, so we could dictate how stale each table could get, and whether a data change pushed to the primary server required immediate distribution to the clients or could wait until the next scheduled distribution.

We have an http-based service - tons of requests and every request needs to be authenticated (90% very small requests and the rest pretty big). Even after putting the DB on the same machine (24 GB RAM, fast disks etc.) we could improve scalability by 100% by implementing a cache (based on ConcurrentDictionary) - thus being able to serve double the requests with the same machine.

For making changes to users/logins etc. taking immediate effect we implemented a webservice (SOAP) which is used for this kind of stuff and maintains the cache in a "write-through-manner".

So all in all we are really happy with caching.

I've built a number of big web sites using various ASP.Net providers. I've written them against Azure Table Storage, Oracle, LDAP Servers, and various custom back-ends.

I personally wouldn't worry about scale for a while. Databases are fast and reliable. As soon as you start caching this data you'll have many problems to worry about:

Cache Invalidation. User is fired, and you update the DB via an out-of-band process. How do you update your webserver cache?
Bugs. While caching isn't hard, the fewer lines of code you have to write the better.
Security. There are security issues you'll probably overlook.
Time. Work on customer features. It's much more important.

Optimize if/when you need to. Until then, add features to make your stuff better.

If you're worried about scale, put the User/Role tables in a separate DB from the rest of your data. If you start to outgrow that server you can easily migrate your user data to a larger DB.