MySQL count(*) , Group BY and INNER JOIN_问答_开发者

I have a really bad time with a query on MySQL 5.1. I simplified the 2 tables I make a JOIN on :

CREATE TABLE  `jobs` (
`id` INT NOT NULL AUTO_INCREMENT PRIMARY KEY ,
`title` VARCHAR( 255 ) NOT NULL
) ENGINE = MYISAM ;

AND

CREATE TABLE `jobsCategories` (
 `jobID` int(11) NOT NULL,
 `industryID` int(11) NOT NULL,
 KEY `jobID` (`jobID`),
 KEY `industryID` (`industryID`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1

The query is straight forward :

SELECT count(*) as nb,industryID 
FROM  jobs J 
INNER JOIN jobsCategories C ON C.jobID=J.id 
GROUP BY industryID 
ORDER BY nb DESC;

I got around 150000 records into the jobs table, and 350000 records into the jobsCategories table, and I have 30 industries;

The query takes approximatively 50 seconds to execute !!!

Do you have any idea why it takes so long? How could I optimize the structure of this database? Profilling the query show me that 99% of the execution time is spend on copying on tmp tables.

EXPLAIN <query> gives me : 


*************************** 1. row ***************************
       id: 1
select_type: SIMPLE
    table: J
     type: index
possible_keys: PRIMARY
      key: PRIMARY
  key_len: 4
      ref: NULL
     rows: 178950
    Extra: Using index; Using temporary; Using file开发者_开发技巧sort
*************************** 2. row ***************************
       id: 1
 select_type: SIMPLE
    table: C
     type: ref
possible_keys: jobID
      key: jobID
  key_len: 8
      ref: J.id
     rows: 1
    Extra: Using where
2 rows in set (0.00 sec)

About the memory :

free -m  : 

total       used       free     shared    buffers     cached
Mem:          2011       1516        494          0          8       1075
-/+ buffers/cache:        433       1578
Swap:         5898        126       5772

With the FORCE INDEX suggested below

select count(*) as nb, industryID 
from 
    jobs J 
    inner join jobsCategories C force index (industryID) on (C.jobID = J.id )
group by industryID 
order by nb DESC;

SHOW PROFILE;

gives me :

+----------------------+----------+
| Status               | Duration |
+----------------------+----------+
| starting             | 0.000095 |
| Opening tables       | 0.000014 |
| System lock          | 0.000008 |
| Table lock           | 0.000007 |
| init                 | 0.000032 |
| optimizing           | 0.000011 |
| statistics           | 0.000032 |
| preparing            | 0.000016 |
| Creating tmp table   | 0.000031 |
| executing            | 0.000003 |
| Copying to tmp table | 3.301305 |
| Sorting result       | 0.000028 |
| Sending data         | 0.000024 |
| end                  | 0.000003 |
| removing tmp table   | 0.000009 |
| end                  | 0.000004 |
| query end            | 0.000003 |
| freeing items        | 0.000029 |
| logging slow query   | 0.000003 |
| cleaning up          | 0.000003 |
+----------------------+----------+

I guess my RAM (2Gb) is not large enough. How can I be certain this is the case?

Firstly I think that you don't need to join table jobs in order to get the same result (unless you have some garbage data in table jobsCategories):

select count(*) as nb, industryID 
from jobsCategories
group by industryID 
order by nb DESC;

Otherwise you may try to force index on industryID:

select count(*) as nb, industryID 
from 
    jobs J 
    inner join jobsCategories C force index (industryID) on (C.jobID = J.id )
group by industryID 
order by nb DESC;

change your tables to InnoDB =) InnoDB is good managing big tables and the COUNT(*) to make it faster

http://www.mysqlperformanceblog.com/2009/01/12/should-you-move-from-myisam-to-innodb/

Good Luck

EDIT: after testing, it seems that MyISAM is faster than InnoDB when using COUNT(*) when there is no WHERE clause:

http://www.mysqlperformanceblog.com/2006/12/01/count-for-innodb-tables/

anyway, i've tested your exact query simulating the tables that you have (150k Jobs and 300k JobsCategories) using MyISAM tables and it took 1.5 seconds so maybe your problem is elsewhere.. it's all i can tell you =P

Hope I'm not misinterpreting the reading, but from what I see, you don't need ANY join. Since your grouping is how many jobs fall under each respective industry, its all in your job categories table, why join to the actual job table for the title of the job since that is not even being returned

select IndustryID,
       count(*) JobsPerIndustry
   from JobCategories
   group by IndustryID

EDIT PER COMMENT / FEEDBACK...

That definitely makes a difference... adding a criteria associated with a job... Ensure your Jobs table has an index on the element you are expecting to allow limiting based on... Then follow similar query like you originally had. Ensure your Jobs table has an index on CountryID.

SELECT
      count(*) as nb,
      industryID 
   FROM  jobs J 
      JOIN jobsCategories C 
         ON J.ID = C.jobID
   WHERE 
      J.countryID=1234
   GROUP BY 
      industryID 
   ORDER BY 
      nb DESC;