开发者

How do we greatly optimize our MySQL database (or replace it) when using joins?

开发者 https://www.devze.com 2023-01-03 16:01 出处:网络
This is the first time I\'m approaching an extremely high-volume situation. This is an ad server based on MySQL. However, the query that is used incorporates a lot of JOINs and is generally just slow.

This is the first time I'm approaching an extremely high-volume situation. This is an ad server based on MySQL. However, the query that is used incorporates a lot of JOINs and is generally just slow. (This is Rails ActiveRecord, btw)

sel = Ads.find(:all, :select => '*', :joins => "JOIN campaigns ON ads.campaign_id = campaigns.id JOIN users ON campaigns.user_id = users.id LEFT JOIN countries ON countries.campaign_id = campaigns.id LEFT JOIN keywords ON keywords.campaign_id = campaigns.id", :conditions => [flashstr + "keywords.word = ? AND ads.format = ? AND campaigns.cenabled = 1 AND (co开发者_如何学Cuntries.country IS NULL OR countries.country = ?) AND ads.enabled = 1 AND campaigns.dailyenabled = 1 AND users.uenabled = 1", kw, format, viewer['country'][0]], :order => order, :limit => limit)

My questions:

  1. Is there an alternative database like MySQL that has JOIN support, but is much faster? (I know there's Postgre, still evaluating it.)

  2. Otherwise, would firing up a MySQL instance, loading a local database into memory and re-loading that every 5 minutes help?

  3. Otherwise, is there any way I could switch this entire operation to Redis or Cassandra, and somehow change the JOIN behavior to match the (non-JOIN-able) nature of NoSQL?

Thank you!


EDIT: here are more details:

Full executed SQL with flattened select (truncated above):

SELECT campaigns.id, campaigns.guid, campaigns.user_id, campaigns.dailylimit, campaigns.impressions, campaigns.cenabled, campaigns.dayspent, campaigns.dailyenabled, campaigns.fr, ads.id, ads.guid, ads.user_id, ads.campaign_id, ads.format, ads.enabled, ads.datafile, ads.data1, ads.data2, ads.originalfilename, ads.aid, ads.impressions, countries.id, countries.guid, countries.campaign_id, countries.country, keywords.id, keywords.campaign_id, keywords.word, keywords.bid FROM ads JOIN campaigns ON ads.campaign_id = campaigns.id JOIN users ON campaigns.user_id = users.id LEFT JOIN countries ON countries.campaign_id = campaigns.id LEFT JOIN keywords ON keywords.campaign_id = campaigns.id WHERE (keywords.word = 'design' AND ads.format = 10 AND campaigns.cenabled = 1 AND (countries.country IS NULL OR countries.country = 82) AND ads.enabled = 1 AND campaigns.dailyenabled = 1 AND users.uenabled = 1 AND ads.datafile != '') ORDER BY keywords.bid DESC LIMIT 1,1

EXPLAIN/execution plan:

+----+-------------+-----------+--------+------------------+-------------+---------+------------------------------------+------+----------------------------------------------+
| id | select_type | table     | type   | possible_keys    | key         | key_len | ref                                | rows | Extra                                        |
+----+-------------+-----------+--------+------------------+-------------+---------+------------------------------------+------+----------------------------------------------+
|  1 | SIMPLE      | keywords  | ref    | campaign_id,word | word        | 257     | const                              |    9 | Using where; Using temporary; Using filesort | 
|  1 | SIMPLE      | ads       | ref    | campaign_id      | campaign_id | 4       | e_development.keywords.campaign_id |    8 | Using where                                  | 
|  1 | SIMPLE      | campaigns | eq_ref | PRIMARY          | PRIMARY     | 4       | e_development.keywords.campaign_id |    1 | Using where                                  | 
|  1 | SIMPLE      | users     | eq_ref | PRIMARY          | PRIMARY     | 4       | e_development.campaigns.user_id    |    1 | Using where                                  | 
|  1 | SIMPLE      | countries | ALL    | campaign_id      | NULL        | NULL    | NULL                               |    4 | Using where                                  | 
+----+-------------+-----------+--------+------------------+-------------+---------+------------------------------------+------+----------------------------------------------+

(this is on a development database, which doesn't have nearly as many rows as the production version.)

DEFINED INDICES:

ads -> id (primary, autoinc) + aid (unique) + campaign_id (index) + user_id (index)
campaigns -> id (primary, autoinc) + user_id (index)
countries -> id (primary, autoinc) + campaign_id (index) + country (index) + user_id (index)
keywords -> id (primary, autoinc) + campaign_id (index) + word (index) + user_id (index)
user -> id (primary, autoinc)


Database theory and the nominal practice exist to provide a framework for a majority of cases. Not every database usage pattern fits neatly into 3rd normal form. Hence the emergence of NoSQL. These database don't work well in a majority of cases but do work great in specific cases. One reason they work well is because they DON'T work like a normal RDBMS. Cassandra does have some facility for 'joining' but I don't remember the exact details. If you want a quick understanding I'd recommend the Digg developers blog. There's a nice simple description.

The problem is that I'll bet you a pickle that joining 4 tables would be slower than mySQL. And the only way to know for sure would be learning a new DBMS, installing it, tuning the install as well as you can tune MySQL and setting up all your data and .... you'll like find out MySQL does pretty damn good.

Trying to solve the EXACT SAME problem the EXACT SAME way with a different engine won't cut it... you have to THINK like a NoSQL developer, not a RDBMS developer using NoSQL.

But you can think about the problem as Frustrated suggests.

Why do we have Third Normal Form? Ease of Update mainly. I update one row instead of dozens. It also helps constrain data, if I carefully control addition of countries in the country table, I'll never get a bad one in the campaign table. After that, 3NF doesn't make querying faster, which is why we invented reporting databases, OLAP, Cubes, Star Schemas.

the Key is that it's a different structure for reporting vs editing/capturing.

As Frustrated said, determine the speed of change in your underlying data. If you're really adding countries every 5 minutes, I'll be stunned. Campaigns? probably occasional? Ads? a couple times a day. How long would it take to build a fully flattened table and index it? How many rows does that produce? if that cycle time is much shorter than your update frequency... build that and see. Test the query speed. That's a cheaper experiment than going for a whole new DB.


Have you analysed your execution plan? Have you analysed your indices?

My first guess would be that you need an index on campaigns for user_id, index on countries for campaign_id, on keywords on campaign_id... maybe others. You need to get an execution plan to see what your query is doing.

The other option: How often does the data in this result set change? By the minute? Hour? Day? If it's daily or hourly (well, several hours), it might be better to have a secondary table that contains ALL the columns (or just the columns that aren't likely to change frequently) of this result set and is populated by this query every n hours. Then your app would just query the secondary table (or maybe join with one table that has frequently changing data), it could be faster that way.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号