Consider this query:
SELECT DISTINCT (linkindex_tags.link_id)
, links_sorted.link_title
, links_sorted.link_url
FROM linkindex_tags
INNER JOIN links_sorted ON links_sorted.link_id = linkindex_tags.link_id
ORDER BY
(
IF (word_id = 400, 1,0)+
IF (word_id = 177, 1,0)+
IF (word_id = 114, 1,0)+
IF (word_id = 9, 1,0)+
IF (word_id = 270, 1,0)+
IF (word_id = 715, 1,0)+
IF (word_id = 279, 1,0)+
IF (word_id = 1, 1,0)+
IF (word_id = 1748, 1,0)
) DESC
LIMIT 0,15;
So looking for matches to a series of word_id's and odering by the score of those matches (e.g. find a link with 5 word_ids then it's a score of 5)
The linkindex_tags table is currently 552,196 rows (33 MB) but will expan to many millions The link_sorted table is currently 823,600 (558MB - obv more data per row) rows but will also expand to more. The linkindex_tags table is likely to be around 8-12 times larger than links_sorted.
Execution Time : 7.069 sec on a local i3 core windows 7 machine. My server is CentOs 64bit 8GB ram Intel Xeon 3470 (Quad Core) - so that will aid in the matter slightly I guess as can assign decent RAM allocation.
It is running slowly and was wondering if my approach is all wrong. Here's the slow bits from the profile breakdown:
Copying to tmp table - (time) 3.88124 - (%) 55.08438
Copying to tmp table on disk - (time) 2.683123 -(%) 8.08010 converting HEAP to MyISAM - (time) 0.37656 - (%) 5.34432Here's the EXPLAIN:
id - 1
select_type - SIMPLE
table - linkindex_tags
type - index
possible_keys - link_id,link_id_2
key - link_id
key_len - 8
ref - \N
rows - 552196
Extra - Using index; Using temporary; Using filesort
2nd row
id - 1
select_type - SIMPLE
table - links_sorted
type - eq_ref
possible_keys - link_id
key - link_id
key_len - 4
ref - flinksdb.linkindex_tags.link_id
rows - 1
Extra -
And finally the 2 table schema's:
CREATE TABLE IF NOT EXISTS `linkindex_开发者_JAVA百科tags` (
`linkindex_tag_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`link_id` int(10) unsigned NOT NULL,
`word_id` int(10) unsigned NOT NULL,
PRIMARY KEY (`linkindex_tag_id`),
UNIQUE KEY `link_id` (`link_id`,`word_id`),
KEY `link_id_2` (`link_id`),
KEY `word_id` (`word_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=0 ;
CREATE TABLE IF NOT EXISTS `links_sorted` (
`link_sorted_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`site_id` int(10) unsigned NOT NULL,
`link_id` int(10) unsigned NOT NULL,
`link_title` char(255) NOT NULL,
`link_duration` char(20) NOT NULL,
`link_url` char(255) NOT NULL,
`active` tinyint(4) NOT NULL,
PRIMARY KEY (`link_sorted_id`),
UNIQUE KEY `link_id` (`link_id`),
KEY `link_title` (`link_title`,`link_url`,`active`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=0 ;
Have to stick with INT as may enter a range bigger than MEDIUMINT. Without the join, just getting the ids the query is fast now I've upped some MySQL settings.
Don't know too much about MySQL settings and their effects so if you need me to change a few settings and run some tests by all means fire away!
Oh and I played with the mysql.ini settings so they're like this - just guessing and toying really!
key_buffer = 512M
max_allowed_packet = 1M
table_cache = 512M
sort_buffer_size = 512M
net_buffer_length = 8K
read_buffer_size = 512M
read_rnd_buffer_size = 512K
How can I speed up this query?
A few comments:
DISTINCT
SELECT DISTINCT
works on all the fields selected, no matter how many ()
you use, use a GROUP BY
clause instead if you only want 1 field to be distinct.
Note that this will make the results of your query indeterminate!
Keep the distinct, or aggregate the other fields in a GROUP_CONCAT
if you want to prevent that.
ORDER BY
A field can only have one value at a time, adding different IF's together, when there can be only one that matches is a waste of time, use an IN
instead.
A boolean = 1 for true, 0 for false, you don't need an extra IF to assert that.
WHERE
If you have a lot of rows, consider adding a where that can reduce the number of rows under consideration, without altering the outcome.
?
Is the series: 400,177,114,9,270,715,279,1,1748
same sort of magical construct like the 4-8-15-16-23-42
in Lost?
SELECT lt.link_id
, GROUP_CONCAT(ls.link_title) as link_titles
, GROUP_CONCAT(ls.link_url) as link_urls
FROM linkindex_tags lt
INNER JOIN links_sorted ls ON ls.link_id = lt.link_id
WHERE lt.word_id <= 1748
GROUP BY lt.link_id
ORDER BY
(
lt.word_id IN (400,177,114,9,270,715,279,1,1748)
) DESC
LIMIT 15 OFFSET 0;
精彩评论