I have a php webservice which can be called (from mobile phones) to perform certain task. For these tasks to be done, the caller must be "logged in." What is the best way to handle the authentication?
Currently, I'm just using SESSIONS. The client calls a login API, and any other API needed. But I'm concerned about the impact of having 200,000 people al开发者_开发技巧l calling this service and have all of those sessions. I not sure how the server will respond. Any tips? How is this typically handled? Like facebook, flickr, etc....
If this is being called by a custom client program (i.e. you mobile phones), and not the browser, why "log them in" at all. Rather, simply use HTTP Authentication (either DIGEST or BASIC if you're going SSL, or your own scheme), and "log them in" every time.
Then you don't have to worry about sessions, about load balancing, and fail over, etc. Keep it stateless.
Addenda:
Certainly, fewer hits to the DB are better, that's just a general rule. But at the same time, many hits to the DB are handled by cached pages on the DB server, or possibly application caches so that they never hit the DB server. So, in some cases, particularly single row queries against an indexed column, DB hits can be very cheap.
Now, one might consider if they're both stored and readily accessed, what's really the difference between a cache bit of the database, and a unique user session.
Well, primarily, the difference is in the contract with the data. A cached item has lifespan directly proportional to the amount of memory you have and the amount of uncached activity happening. Give it a small amount of memory, and the cached item likely has a very short lifespan. Give it a lot of memory, and the cached item has a much better chance of hanging around. If the amount of memory for cached data is large enough to where repeated activity for that data continues to use the cache, the cache is a big win. If your cache is recycling so fast nothing is ever "in" the cache, you cache has almost no value. But the point is that the system will work with or without the cache, the cache is simply a performance enhancement.
A session, however, has a different contract. Many sessions have a specific, minimum lifespan, typically measured in minutes: 10, 20, even 30 minutes.
That means that if a user hit your site just once, you must dedicate resources to that user even if he never comes back. You have to, otherwise the session offer effectively no value.
If you get a lot of traffic, you get a lot of new sessions to manage. In theory, under bad circumstance, sessions can spike without limit. If you suddenly get 10,000 hits on your site, you get to manage the remains of those hits for the minimal lifespan of your session. You have to dedicate resources (memory or disk) to them, you have to keep track of them, and then, inevitably, you have to clean them up.
A cache is a fixed resource. It only grows to the size you configure it. You have no obligation to keep anything in the cache, and as discussed earlier, the system will function with or without the cache. Caches naturally recycle. If you get that surge of 10,000 hits, they'll possibly roll your cache, but after that they leave no mark on your system. They can hit and be gone in 1 or 2 minutes, never to be seen again.
Finally, with sessions, you need to share them among your infrastructure so that they travel with the user if they hop from machine to machine (for whatever reason). Caches don't. Ideally you want to keep a user local to a set of resources, so that the caches can do their job, but the system works whether they move or stay (it just works better if they stay, because of the cache reuse). If you don't replicate your sessions, they don't work at all.
DB hits add up, they can be cheap, but they're never free. But a session has its own costs as well, so it important to consider them both and how they apply within your architecture.
Currently, I'm just using SESSIONS. The client calls a login API, and any other API needed. But I'm concerned about the impact of having 200,000 people all calling this service and have all of those sessions.
Standard those sessions touch the disc because default session_save_handler is set to file
. It is better for your system to not touch the disc(memory is much faster). You could try to override session_set_save_handler to use something different than file
. For example you could have sessions be stored in:
- redis(I like the predis client). Even faster would be to install C extension, but need probably need root access to recompile PHP. If you have that many users you should probably own/rent VPS. The nice folks at http://redistogo.com provide you with free plans (5 MB) if you can't install anything on the computer. I mentioned above that you should be having the capability to install things if you really want to have performance.
- memcached
these in-memory databases also support better scaling. You should also be using these databases to cache the rest of your database-queries(MySQL?). You have to remember that touching the disc is very slow compared to just using memory.
You should also should install APC to get the best performance.
How is this typically handled? Like facebook, flickr, etc....
Nowadays you can't use any API without using OAuth(although I think authentication via sessions are easier to implement). It is the new de facto standard for doing authentication without having to share passwords. The creator of PHP(Rasmus) has made a tutorial explaining how to Writing an OAuth Provider Service. Searching oauth php in google you should get yourself more than enough information.
Also nowadays most of Facebook's site is using HipHop instead of the plain old PHP to speed up their website. PHP has open-sourced a lot of there works which you could/should use:
精彩评论