Select takes long time. How to solve this problem?_问答_开发者

I have a big base in MYSQL - 300 mb, where are 4 tables: the first one is about 200mb, the second is - 80. There are 150 000 records in first table and 200 000 in second.

At the same time I use inner join there.

Select takes 3 seconds when I use optimization and indeces (before that it took about 20-30 seconds). It is enough good result. But I need more, because page is loading for 7-8 seconds (3-4 for select, 1 for count, another small queries 1 sec, and 1-2 for page generation).

So, what I should do then? May be postgres takes开发者_开发问答 less time than mysql? Or may be better to use memcaches, but in this case it can take lots of memory then (there are too many variants of sorting).

May be anybody has another idea? I would be glad to hear the new one:)

OK. I see we need queries:) I renamed fields for table_1.

     CREATE TABLE  `table_1` (
      `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
      `field` varchar(2048) DEFAULT NULL,
      `field` varchar(2048) DEFAULT NULL,
      `field` int(10) unsigned DEFAULT NULL,
      `field` text,
      `field` text,
      `field` text,
      `field` varchar(128) DEFAULT NULL,
      `field` text,
      `field` text,
      `field` text,
      `field` text,
      `field` text,
      `field` varchar(128) DEFAULT NULL,
      `field` text,
      `field` varchar(4000) DEFAULT NULL,
      `field` varchar(4000) DEFAULT NULL,
      `field` int(10) unsigned DEFAULT '1',
      `field` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
      `field` text,
      `new` tinyint(1) NOT NULL DEFAULT '0',
      `applications` varchar(255) DEFAULT NULL,
      PRIMARY KEY (`id`),
      KEY `indexNA` (`new`,`applications`) USING BTREE
    ) ENGINE=InnoDB AUTO_INCREMENT=153235 DEFAULT CHARSET=utf8;

CREATE TABLE  `table_2` (
  `id_record` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `catalog_name` varchar(512) NOT NULL,
  `catalog_url` varchar(4000) NOT NULL,
  `parent_id` int(10) unsigned NOT NULL DEFAULT '0',
  `checked` tinyint(1) NOT NULL DEFAULT '0',
  `level` int(10) unsigned NOT NULL DEFAULT '0',
  `work` int(10) unsigned NOT NULL DEFAULT '0',
  `update` int(10) unsigned NOT NULL DEFAULT '1',
  `type` int(10) unsigned NOT NULL DEFAULT '0',
  `hierarchy` varchar(512) DEFAULT NULL,
  `synt` tinyint(1) NOT NULL DEFAULT '0',
  PRIMARY KEY (`id_record`,`type`) USING BTREE,
  KEY `rec` (`id_record`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=14504 DEFAULT CHARSET=utf8;

CREATE TABLE  `table_3` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `id_table_1` int(10) unsigned NOT NULL,
  `id_category` int(10) unsigned NOT NULL,
  `work` int(10) unsigned NOT NULL DEFAULT '1',
  `update` int(10) unsigned NOT NULL DEFAULT '1',
  PRIMARY KEY (`id`),
  KEY `site` (`id_table_1`,`id_category`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=203844 DEFAULT CHARSET=utf8;

There queries are: 1) get general count (takes less than 1 sec):

SELECT count(table_1.id) FROM table_1
 INNER JOIN table_3 ON table_3.id_table_id = table_1.id
 INNER JOIN table_2 ON table_2.id_record = table_3.id_category
WHERE ((table_2.type = 0)
AND (table_3.work = 1 AND table_2.work = 1)
 AND (table_1.new = 1))AND 1 IN (table_1.applications)

2) get list for page with limit (it takes from 3 to 7 seconds, depends on count):

SELECT table_1.field, table_1.field, table_1.field, table_1.field, table_2.catalog_name FROM table_1
 INNER JOIN table_3 ON table_3.id_table_id = table_1.id
 INNER JOIN table_2 ON table_2.id_record = table_3.id_category
WHERE ((table_2.type = 0)
AND (table_3.work = 1 AND table_2.work = 1)
 AND (table_1.new = 1))AND 1 IN (table_1.applications) LIMIT 10 OFFSET 10

Do Not Change DBMS

I would not suggest to change your DBMS, it may be very disruptive. If you have used MySQL specific queries that are not compatible with Postgres; you might need to redo whole indexing etc. Even then it may not guarantee a performance improvement.

Caching is a Good Option

Caching is really good idea. It takes load off your DBMS. It is best suited if you have heavy read, light write. This way objects would stay more time in Cache. MemcacheD is really good caching mechanism, and is really simple. Rapidly scaling sites (like Facebook and the likes) make heavy use of MemcacheD to alleviate the load from database.

How to Scale-up Really Big Time

Although, you do not have very heavy data.. so most likely caching would help you. But the next step ahead of caching is noSQL based solutions like Cassandra. We use cassandra in one of our application where we have heavy read and write (50:50) operation and database is really large and fast growing. Cassandra gives good performance. But, I guess in your case, Cassandra is an overkill.

But...

Before, you dive into any serious changes, I would suggest to really look into indexes. Try scaling vertically. Look into slow queries. (Search for slow query logging directive). Hopefully, MySQL will be faster after optimizing these thing and you would not need additional tools.

You should look into indexing specific to the most frequent/time consuming queries you use. Check this post on indexing for mysql.

Aside from all the other suggestions others have offered, I've slightly altered and not positive of the performance impact under MySQL. However, I've added STRAIGHT_JOIN so the optimizer doesn't try to think which order or table to join FOR you.

Next, I moved the "AND" conditions into the respective JOIN clauses for tables 2 & 3.

Finally, the join from table 1 to 3 had (in your post)

   table_3.id_table_id = table_1.id

instead of

   table_3.id_table_1 = table_1.id

Additionally, I can't tell performance, but maybe having a stand-alone index on just the "new" column for exact match first without regards to the "applications" column. I don't know if the compound index is causing an issue since you are using an "IN" for the applications and not truly an indexable search basis.

Here's the modified results

SELECT STRAIGHT_JOIN 
        count(table_1.id) 
    FROM 
        table_1
            JOIN table_3 
                ON table_1.id = table_3.id_table_1
                   AND table_3.work = 1
                JOIN table_2 
                    ON table_3.id_category = table_2.id_record
                    AND table_2.type = 0
                    AND table_2.work = 1
    WHERE 
            table_1.new = 1
        AND 1 IN table_1.applications


SELECT STRAIGHT_JOIN 
        table_1.field, 
        table_1.field, 
        table_1.field, 
        table_1.field, 
        table_2.catalog_name 
    FROM 
        table_1
            JOIN table_3 
                ON table_1.id = table_3.id_table_1
                AND table_3.work = 1
                JOIN table_2 
                    ON table_3.id_category = table_2.id_record
                    AND table_2.type = 0
                    AND table_2.work = 1
    WHERE 
            table_1.new = 1
        AND 1 IN table_1.applications
    LIMIT 10 OFFSET 10

You should also optimize your query.

Without a look into the statements this question can only be answered using theoretical approaches. Just a few ideas to take into consideration...

The SELECT-Statement...

First of all, make sure that your query is as "good" as it can be. Are there any indeces you might have missed? Are those indeces the same field types and so on? Can you perhaps narrow the query down so the database has less to work on?

The Query cache...

If your query is repeated pretty often, it might help to use the Query cache or - in case you're already using it - give it more RAM.

The Hardware...

Of course different RDBMS are slower or faster than others, depending on their strenght or weaknesses, but if your query is optimized into oblivion, you only can get it faster while scaling up the database server (better cpu, better i/o and so on, depending on where the bottleneck is).

Other Factors...

If this all is maxed out, maybe try speeding up the other components (1-2 secs for page generation looks pretty slow to me).

To all those factors mentioned there is a huge amount of ideas and posts in stackoverflow.com.

That is actually not such a big database, certainly not too much for your database system. As comparison, the database that we are using is currently around 40 GB. It's an MS SQL Server, though, so it's not directly comparable, but there is no dramatic difference between the database systems.

My guess is that you haven't been completely successful in using indexes to speed up the query. You should look at the execution plan for the query and see if you can spot what part of the execution that is taking most of the time.