SQL performance on multiple id matching and a Join statement_问答_开发者

Consider this query:

SELECT DISTINCT (linkindex_tags.link_id)
   , links_sorted.link_title
   , links_sorted.link_url
FROM linkindex_tags
INNER JOIN links_sorted ON links_sorted.link_id = linkindex_tags.link_id
ORDER BY
(
IF (word_id = 400, 1,0)+
IF (word_id = 177, 1,0)+
IF (word_id = 114, 1,0)+
IF (word_id = 9, 1,0)+
IF (word_id = 270, 1,0)+
IF (word_id = 715, 1,0)+
IF (word_id = 279, 1,0)+
IF (word_id = 1, 1,0)+
IF (word_id = 1748, 1,0)
)  DESC
LIMIT 0,15;

So looking for matches to a series of word_id's and odering by the score of those matches (e.g. find a link with 5 word_ids then it's a score of 5)

The linkindex_tags table is currently 552,196 rows (33 MB) but will expan to many millions The link_sorted table is currently 823,600 (558MB - obv more data per row) rows but will also expand to more. The linkindex_tags table is likely to be around 8-12 times larger than links_sorted.

Execution Time : 7.069 sec on a local i3 core windows 7 machine. My server is CentOs 64bit 8GB ram Intel Xeon 3470 (Quad Core) - so that will aid in the matter slightly I guess as can assign decent RAM allocation.

It is running slowly and was wondering if my approach is all wrong. Here's the slow bits from the profile breakdown:

Copying to tmp table - (time) 3.88124 - (%) 55.08438

Copying to tmp table on disk - (time) 2.683123 -(%) 8.08010

converting HEAP to MyISAM - (time) 0.37656 - (%) 5.34432

Here's the EXPLAIN:

id -    1
select_type -   SIMPLE
table - linkindex_tags
type -  index
possible_keys - link_id,link_id_2
key -   link_id
key_len -   8
ref -   \N
rows -  552196
Extra - Using index; Using temporary; Using filesort

2nd row

id -    1
select_type -   SIMPLE
table - links_sorted
type -  eq_ref
possible_keys - link_id
key -   link_id
key_len -   4
ref -   flinksdb.linkindex_tags.link_id
rows -  1
Extra -

And finally the 2 table schema's:

CREATE TABLE IF NOT EXISTS `linkindex_开发者_JAVA百科tags` (
  `linkindex_tag_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `link_id` int(10) unsigned NOT NULL,
  `word_id` int(10) unsigned NOT NULL,
  PRIMARY KEY (`linkindex_tag_id`),
  UNIQUE KEY `link_id` (`link_id`,`word_id`),
  KEY `link_id_2` (`link_id`),
  KEY `word_id` (`word_id`)
) ENGINE=MyISAM  DEFAULT CHARSET=latin1 AUTO_INCREMENT=0 ;

CREATE TABLE IF NOT EXISTS `links_sorted` (
  `link_sorted_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `site_id` int(10) unsigned NOT NULL,
  `link_id` int(10) unsigned NOT NULL,
  `link_title` char(255) NOT NULL,
  `link_duration` char(20) NOT NULL,
  `link_url` char(255) NOT NULL,
  `active` tinyint(4) NOT NULL,
  PRIMARY KEY (`link_sorted_id`),
  UNIQUE KEY `link_id` (`link_id`),
  KEY `link_title` (`link_title`,`link_url`,`active`)
) ENGINE=MyISAM  DEFAULT CHARSET=latin1 AUTO_INCREMENT=0 ;

Have to stick with INT as may enter a range bigger than MEDIUMINT. Without the join, just getting the ids the query is fast now I've upped some MySQL settings.

Don't know too much about MySQL settings and their effects so if you need me to change a few settings and run some tests by all means fire away!

Oh and I played with the mysql.ini settings so they're like this - just guessing and toying really!

key_buffer = 512M 
max_allowed_packet = 1M 
table_cache = 512M 
sort_buffer_size = 512M 
net_buffer_length = 8K 
read_buffer_size = 512M 
read_rnd_buffer_size = 512K

How can I speed up this query?

A few comments:

DISTINCT
SELECT DISTINCT works on all the fields selected, no matter how many () you use, use a GROUP BY clause instead if you only want 1 field to be distinct.
Note that this will make the results of your query indeterminate!
Keep the distinct, or aggregate the other fields in a GROUP_CONCAT if you want to prevent that.

ORDER BY
A field can only have one value at a time, adding different IF's together, when there can be only one that matches is a waste of time, use an IN instead.
A boolean = 1 for true, 0 for false, you don't need an extra IF to assert that.

WHERE
If you have a lot of rows, consider adding a where that can reduce the number of rows under consideration, without altering the outcome.

?
Is the series: 400,177,114,9,270,715,279,1,1748 same sort of magical construct like the 4-8-15-16-23-42 in Lost?

SELECT lt.link_id
   , GROUP_CONCAT(ls.link_title) as link_titles
   , GROUP_CONCAT(ls.link_url) as link_urls
FROM linkindex_tags lt
INNER JOIN links_sorted ls ON ls.link_id = lt.link_id
WHERE lt.word_id <= 1748
GROUP BY lt.link_id
ORDER BY
(
  lt.word_id IN (400,177,114,9,270,715,279,1,1748)
)  DESC
LIMIT 15 OFFSET 0;