Leaderboard design and performance in oracle_问答_开发者

I'm developing a game and I'm using a leaderboard to 开发者_开发百科keep track of a player's score. There is also the requirement to keep track of about 200 additional statistics. These stats are things like: kills, deaths, time played, weapon used, achievements gained and so on.

What players will be interested in is is the score,kills,deaths and time played. All the other stats are not necessarily needed to be shown in the game but should be accessible if I want to view them or compare them against other players. The expected number of players to be stored in this leaderboard table is about 2 million.

Currently the design is to store a player id together will all the stats in one table, for instance:

player_id,points,stat_1 .. stat_200,date_created,date_updated

If I want to show a sorted leaderboard based on points then I would have to put an index on points and do a sort on it with a select query and limit the results to return say 50 every time. There are also ideas to be able to have a player sort the leaderboard on a couple of other stats like time played or deaths up to a maximum of say 5 sortable stats.

The number of expected users playing the game is about 40k concurrently. Maybe a quarter of them, but this is really a ballpark figure, will actively browse the leaderboard, the rest will just play the game and upload their scores when they are finished.

I have a number of questions about this approach below:

It seems, but I have my doubts, that the consensus is that leaderboards with millions of records that should be sortable on a couple of stats don't scale very well in a RDBMS. Is this correct ?
Is sorting the leaderboard on points through a select query, assuming we have an index on it, going to be extremely slow and if so how can I work around this ?
Should I split up the storing of the additional stats that are not to be sorted in a separate table or is there another even better approach ?
Is caching the sorted results in memory or in a separate table going to be needed, keeping the expected load in mind, and if so which solutions or options should I consider ?

If my approach is completely wrong and I would be better of doing things like this in another way please let me know, even options like NoSQL solutions in cloud hosting environments are open to be considered.

Cheers

1) With multiple indexes it will become more costly to update the table. It all boils down to how often each player status is written to the db.

2) It will be very fast as long as the indexes are small enough to fit into RAM. After that, performance takes a big hit.

3) Sometimes you can gain performance if you add all fields you need to the index, cause then the DBMS doesn't need to access the table at all. This approach has the highest probability to work if the accessed fields are small compared to the size of a row.

4) Oracle will probably be good att doing the caching for you, but if you have a massive load of users all doing the same query it is probably better to run that query regularly and store the result in memory (or a memory-mapped file). For instance, if the high-score list is accessed 50 times/second you can decrease the load caused by that question by 99% by dumping it every 2 seconds. My advice on this is: don't do it unless you need it. Measure the performance first, and add it if necessary.

I've been working on a game with a leaderboard myself recently, using MS SQL Server rather than Oracle, and though the number of records and players aren't the same, here's what I've learnt - in answer to your questions:

As long as you have the right underlying hardware, creating a leaderboard with millions of records and sorting on score etc. should work just fine - databases are really, really efficient at querying and sorting based on indexes.
No, it will be fast.
I see no reason to partition into other tables - you'll have to join to those tables to retrieve the data, and that will incur a performance penalty. Though this might be the issue the normalization comment was aimed at.
I assume you will need to include caching to reach the scale you mention; I wouldn't cache in the database layer (your table is effectively a denormalized, flat record already - I don't think you can partition it much more). Not sure what other layers you've got, but I'd look at how "cacheable" your data is (sounds like leaderboards are fairly static), and cache either in the layer immediately above the database, or add something like ehcache to the mix.

General points:

I'd try it out to get a feel for how it would work. Use something like dbmonster to populate a test system with millions of records, and query against that puppy to get a feel for what works and doesn't.
Once you have that up and running, I'd invest in some more serious load and performance testing before deciding to add caching etc. - the more complex you make the architecture, the harder it is to debug, the more costly it is to build, and the more there is to go wrong. So, only add caching if you really need to because you can prove - through load and performance tests - that you can't meet your response time goals.
Whilst it's true that adding indexes to a table slows down insert/update/delete statements, in most cases that's a negligible penalty - I'd definitely not worry too much about it at this stage.

I don't like tables having hundreds of columns, to begin with but it could be ok. Personally I would prefer having separate ID table and scores table having ID, score types and values, both indexed on only the ID columns. If you organize them as cluster, the parent and child records are all fetched in 1 IO.

The number of transactions you mention asks for some scalability. You have no real idea about the load. I assume there is some application server[farm] that handles the requests.

That is a good fit for the Oracle In-Memory Database Cache option. See result caches ..... what about heavily modified data. This is a smart way of caching you Oracle data on the application server. You create a cache grid, consisting of at least one grid member and for best performance, combine them with the application server[s]. When you add application server, you automatically add Cache Grid Members. It works very well, it is the good old TimesTen technology that is integrated in the database.

You can make the combination, but don't have to. If you don't, you have a no top performance but are more flexible in the number of Grid Members.

meh - millions of records? not a big table.

I'd just create the table (avoid the "stat_1, stat_2" naming - give them their proper names, e.g. "score", "kill_count", etc.), add indexes with leading columns on what the users are most likely to want to sort on (that way Oracle can avoid a sort by using the index to access the table in sorted order).

If the number of stats grows too large, you could "partition" it vertically - e.g. have most of the most frequently accessed stats in one table, then have one or more other tables which have extra stats. Each table would have an identical primary key.