开发者

How to write efficient MySQL query to delete specific rows depending on constraints

开发者 https://www.devze.com 2022-12-17 01:09 出处:网络
I got a question regarding my MySQL-database and would like to get input on what would be most efficient.

I got a question regarding my MySQL-database and would like to get input on what would be most efficient.

My problem is as follows,

I'm developing premium functionality for my board game web site. One premium functionality would be that all the games a user has played would be stored "forever" (for the user to look up afterw开发者_JAVA技巧ards). For normal users games older than 18 months are deleted.

Now I need to figure out an effective way to delete the games (which is more than 18 months old) for normal non premium users and keep games for premium users.

Simplifying things I got two tables (in reality there's one more table which stores the game participants for each game):

Games,

 id=INT
 play_date=DATETIME
 end_score=INT
 player_id_1=INT
 player_id_2=INT

Users,

 id=INT
 premium=BOOLEAN (true=enabled, false=not enabled)

The user table contains 300.000+ rows while the Games table contain a few million rows. Each day approx 20.000 games are added to the Games table.

What would be the most efficient way to remove games older than 18 months from NON-premium users.

So far we've removed games older than 18 months for ALL users each Monday morning.

Now I need to take premium-value and game date into account.

A few solutions(?):

  • JOIN'ing the tables, altough we're talking million of rows in the Games table, this would be a no-no?
  • Get each game-entry older than 18 months, then get each users entry from player_id_1 & player_id_2 and if ANYONE of these are premium, let the game be, else delete it if it's older than 18 months. So for one week this could be 20k*7=140k worth of games.
  • Above solution except I do it every hour. Then there's approx 1000 games I need to get and check.
  • ?? add some kind of helper variable to the Games table? But what if a user stops using premium....

Any tips welcome...


Use an expire date and an index over that.

Allow NULL on that column.

Premium users will have NULL on their games.

Deleting games with expire_date < sysdate will use the index, it is, a INDEX RANGE SCAN (it has to be on orderable index, I mean, some kind of B-tree internal representation... but I'm not a specialist in MySQL).

EDIT

Or mantain a separate table of registry PKs with expire_date. So premium user records will not ocuppy space. Then you do a delete from xxx where pk in (select pk from the expiring_table).

But that's not a very good enhance over the previous solution.

Ages

Maybe you can use ages (by example 1 age = 1 month). And set a field "month_to_live" in the table. Each month you update the field += 1 for all records that are not null. That uses an equality filter. But as I said before, I'm not a specialist so I don't know how much optimization you can gain from that).

I should insist in the expire_date field (and you have the extra capability of extend the live period to anyone, individually, without incurring in extra overhead when you delete records).


Download a dump of the database and do some benchmarking on your computer. This should give you a pretty good idea of how the various solutions performe

Apart from that it's the usual reply for those sql query questions: Run "explain" on your queries and make sure that you have the correct indexes.


JOINing shouldn't be too bad, I guess you are not doing that query "live"? Another option would be to do the query when displaying games: If the user is premium, do no limit, otherwise limit the range.


You need to avoid solutions that UPDATE the games table every time a user changes status since that is unneccessary and slow.

Here's one. Suppose you run the query every day at the same hour :

DELETE games FROM games
JOIN users u1 ON (u1.id=games.player1_id AND NOT u1.premium)
JOIN users u2 ON (u1.id=games.player2_id AND NOT u2.premium)
WHERE games.play_date BETWEEN DATE_SUB( now(), INTERVAL 18 MONTH 1 DAY 1 HOUR)
AND DATE_SUB( now(), INTERVAL 18 MONTH )

Of course you should have an index on games( play_date ).

The idea here is that the date range check examines only games that potentially need deleting and have not already been examined by yesterday's query. Kind of a "rolling window".

On the contrary, this :

WHERE games.play_date < DATE_SUB( now(), INTERVAL 18 MONTH )

would examine all existing games in the table every time and would be very much slower.

0

精彩评论

暂无评论...
验证码 换一张
取 消