开发者

How can I efficiently identify the most popular strings in a large table?

开发者 https://www.devze.com 2023-03-18 08:21 出处:网络
Assuming a table of 50 million last names (for example), how would one efficiently identify the top 10,000?

Assuming a table of 50 million last names (for example), how would one efficiently identify the top 10,000?

Is there a more efficient query than this?

SELECT count(last_name) as cnt, last_name
FROM last_name_table
GROUP BY last_name
ORDER BY cnt DESC
LIMIT 10000;

Assuming:

CREATE TABLE last_name_table (
    `last_name` VARCHAR(255), 
     KEY `last_name` (`last_name`)
);

I can get the top 1000 in 20 minutes. But the top 10000 is taking all day (literally). Any suggestions?开发者_运维问答


How can I efficiently identify the most popular strings in a large table?

According to your question I assume that you don't need the exact numbers, and approximate numbers would be enough.

I offer you to select a subset of random rows and do all the needed calculations on it. Then do a relative scaling of your results for to get the approximate results reflecting the whole table. You have sufficient much data for to get accurate results even with approximation.


Suggestion: precalculate the count of each last_name and store it in a separate table.

Maintain it with triggers (if there are no thousands of inserts minutely in last_name_table or if realtime statistic makes sense) or by scheduler once a day (hour, etc) otherwise.


SQL92 has a "TOP" operator defined for this, so in a SQL92 compliant database you should be able to write
SELECT TOP 10000 ... FROM last_name_table;

However MySQL have not implemented this and you have to use LIMIT as per your own suggestion.


If you add a clause "HAVING count(last_name) > 10" or something like that, then it will strip out all of the uncommon items from your results. Doing it that way, you wouldn't need the "LIMIT" or the "order by". It might speed things up. Also, if you index the cnt with the last_name field, then the index might improve performance.

0

精彩评论

暂无评论...
验证码 换一张
取 消