I have 2 large mysql tables: Articles and ArticleTopics. I want to query the DB and retrieve the last 30 articles published for a given topicID. My current query is rather slow. Any ideas on how to improve it?
More details:
The tables:
Articles (~1 million rows)
+-----------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+--------------+------+-----+---------+----------------+
| articleId | int(11) | NO | PRI | NULL | auto_increment |
| title | varchar(255) | NO | | NULL | |
| content | longtext | NO | | NULL | |
| pubDate | datetime | NO | MUL | NULL | |
+-----------+--------------+------+-----+---------+----------------+
ArticleTopics (~10 million rows)
+-----------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------+--------------+------+-----+---------+-------+
| articleId | int(11) | NO | MUL | NULL | |
| topicId | int(11) | NO | MUL | NULL | |
+-----------+--------------+------+-----+---------+-------+
And my query:
SELECT a.articleId, a.pubDate
FROM Articles a, ArticleTopics t
WHERE t.articleId=a.articleId AND t.topicId=3364
ORDER BY a.pubDate DESC LIMIT 30;
And the EXPLAIN of the query:
+----+-------------+-------+--------+-------------------------------------+-------------------+---------+-------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys 开发者_JAVA百科 | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+-------------------------------------+-------------------+---------+-------------------+------+----------------------------------------------+
| 1 | SIMPLE | t | ref | articleId,topicId,topicId_articleId | topicId_articleId | 4 | const | 4281 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | a | eq_ref | PRIMARY,articleId_pubDate | PRIMARY | 4 | t.articleId | 1 | |
+----+-------------+-------+--------+-------------------------------------+-------------------+---------+-------------------+------+----------------------------------------------+
The slowness, I believe, is coming from the ORDER BY a.pubDate DESC
. I can greatly improve performance by faking it a bit by instead doing an ORDER BY t.articleId DESC
and having an index in ArticleTopics
on both articleId
& topicId
, since in general, the articleIds are in the same order as pubDates. They are not always, however, so it's not ideal. I'd like to be able to sort it on the pubDate.
Update: Added EXPLAIN.
You can rewrite the query in various ways to see if it speeds things up:
SELECT a.articleId, a.pubDate
FROM Articles a
WHERE a.articleId in (
select articleId
from ArticleTopics
where topicId = 3364
)
ORDER BY a.pubDate DESC LIMIT 30;
Or:
SELECT a.articleId, a.pubDate
FROM Articles a
INNER JOIN ArticleTopics t ON t.articleId = a.articleId
WHERE t.topicId = 3364
ORDER BY a.pubDate DESC LIMIT 30;
The important index for both queries is on Articles, and contains articleId as first field.
If article is a large table, with say the entire PDF in binary, you can create an index that fully covers the query. Full coverage means all selected fields are part of the index. For this query, a fully covering index would be (articleId, pubDate).
At this point, do you have an index on topicId
? If so, does the index contain only the topicId
field?
And maybe you can post the output of the EXPLAIN
query.
精彩评论