开发者

how does MySQL implement the "group by"?

开发者 https://www.devze.com 2022-12-23 07:54 出处:网络
I read from the MySQLReference Manual and find thatwhen it can take use of index,it just do index scan,other it will create tmp tables and do things like filesort.

I read from the MySQL Reference Manual and find that when it can take use of index,it just do index scan,other it will create tmp tables and do things like filesort. And I also read from other article that the "Group By" result will sort by group by columns by default,if "order by null" clause added,it won't don filesort. The difference can be found from the "explain ..." clause. so my problem is:what is the difference between "group by" clause that with "order by null" and which doesn't have? I try to use profiling to see what mysql do on the background,and only see result like:

result for group clause without order by null:
|preparing                      | 0.000016 | 
| Creating tmp table             | 0.000048 | 
| executing                      | 0.000009 | 
| Copying to tmp table           | 0.000109 | 
**| Sorting result                 | 0.000023 |** 
| Sending data                   |开发者_开发知识库 0.000027 | 

result for clause with "order by null":
preparing                      | 0.000016 | 
| Creating tmp table             | 0.000052 | 
| executing                      | 0.000009 | 
| Copying to tmp table           | 0.000114 | 
| Sending data                   | 0.000028 | 

So I guess what MySQL do when the "order by null" added,it does not use filesort algorithm,maybe when it creates the tmp table,it uses index as well,and then use the index to do group by operation,when completed,it just read result from the table rows and does not sort the result.

But my original opinion is that MySQL can use quicksort to sort the items and then do group by,so the result will be sorted as well.

Any opinion appreciated,thanks.


mysql> select max(post_date),post_author from wp_posts
-> where id > 10 and id < 1000
-> group by post_author;
+———————+————-+
| max(post_date) | post_author |
+———————+————-+
| 2009-07-03 12:58:39 | 1 |
+———————+————-+
1 row in set (0.01 sec)

mysql> show profiles;
+———-+————+————————+
| Query_ID | Duration | Query |
+———-+————+————————+
| 1 | 0.00013200 | SELECT DATABASE() |
| 2 | 0.00030900 | show databases |
| 3 | 0.00030400 | show tables |
| 4 | 0.01180000 | select max(post_date),post_author from wp_posts where id > 10 and id < 1000 group by post_author |4 rows in set (0.00 sec)

mysql> show profile cpu,block io for query 4;
+———————-+———-+———-+————+————–+—————+
| Status | Duration | CPU_user | CPU_system | Block_ops_in | Block_ops_out |
+———————-+———-+———-+————+————–+—————+
| starting | 0.000085 | 0.000000 | 0.000000 | 0 | 0 |
| Opening tables | 0.000010 | 0.000000 | 0.000000 | 0 | 0 |
| System lock | 0.000005 | 0.000000 | 0.000000 | 0 | 0 |
| Table lock | 0.000008 | 0.000000 | 0.000000 | 0 | 0 |
| init | 0.000029 | 0.000000 | 0.000000 | 0 | 0 |
| optimizing | 0.000014 | 0.000000 | 0.000000 | 0 | 0 |
| statistics | 0.000062 | 0.000000 | 0.000000 | 0 | 0 |
| preparing | 0.000016 | 0.000000 | 0.000000 | 0 | 0 |
| Creating tmp table | 0.000035 | 0.000000 | 0.000000 | 0 | 0 |
| executing | 0.000004 | 0.000000 | 0.000000 | 0 | 0 |
| Copying to tmp table | 0.011386 | 0.004999 | 0.006999 | 0 | 0 |
| Sorting result | 0.000044 | 0.000000 | 0.000000 | 0 | 0 |
| Sending data | 0.000036 | 0.000000 | 0.000000 | 0 | 0 |
| end | 0.000004 | 0.000000 | 0.000000 | 0 | 0 |
| removing tmp table | 0.000012 | 0.000000 | 0.000000 | 0 | 0 |
| end | 0.000004 | 0.000000 | 0.000000 | 0 | 0 |
| end | 0.000004 | 0.000000 | 0.000000 | 0 | 0 |
| query end | 0.000004 | 0.000000 | 0.000000 | 0 | 0 |
| freeing items | 0.000013 | 0.000000 | 0.000000 | 0 | 0 |
| closing tables | 0.000018 | 0.000000 | 0.000000 | 0 | 0 |
| logging slow query | 0.000003 | 0.000000 | 0.000000 | 0 | 0 |
| cleaning up | 0.000004 | 0.000000 | 0.000000 | 0 | 0 |
+———————-+———-+———-+————+————–+—————+
22 rows in set (0.00 sec)

mysql>
mysql>
mysql> select max(post_date),post_author from wp_posts
-> where id > 10 and id < 1000
-> group by post_author order by null;
+———————+————-+
| max(post_date) | post_author |
+———————+————-+
| 2009-07-03 12:58:39 | 1 |
+———————+————-+
1 row in set (0.01 sec)

mysql> show profiles;
+———-+————+—————–+
| Query_ID | Duration | Query
+———-+————+—————–+
|1 | 0.00013200 | SELECT DATABASE()
|2 | 0.00030900 | show databases
|3 | 0.00030400 | show tables
|4 | 0.01180000 | select max(post_date),post_author from wp_posts where id > 10 and id < 1000 group by post_author
|5 | 0.01177700 | select max(post_date),post_author from wp_posts where id > 10 and id < 1000 group by post_author order by null
5 rows in set (0.00 sec)
mysql> show profile cpu,block io for query 5;
+———————-+———-+———-+————+————–+—————+
| Status | Duration | CPU_user | CPU_system | Block_ops_in | Block_ops_out |
+———————-+———-+———-+————+————–+—————+
| starting | 0.000097 | 0.000000 | 0.000000 | 0 | 0 |
| Opening tables | 0.000013 | 0.000000 | 0.000000 | 0 | 0 |
| System lock | 0.000006 | 0.000000 | 0.000000 | 0 | 0 |
| Table lock | 0.000008 | 0.000000 | 0.000000 | 0 | 0 |
| init | 0.000032 | 0.000000 | 0.000000 | 0 | 0 |
| optimizing | 0.000012 | 0.000000 | 0.000000 | 0 | 0 |
| statistics | 0.000065 | 0.000000 | 0.000000 | 0 | 0 |
| preparing | 0.000017 | 0.000000 | 0.000000 | 0 | 0 |
| Creating tmp table | 0.000040 | 0.000000 | 0.000000 | 0 | 0 |
| executing | 0.000003 | 0.000000 | 0.000000 | 0 | 0 |
| Copying to tmp table | 0.011369 | 0.005999 | 0.004999 | 0 | 0 |
| Sending data | 0.000040 | 0.000000 | 0.000000 | 0 | 0 |
| end | 0.000004 | 0.000000 | 0.000000 | 0 | 0 |
| removing tmp table | 0.000031 | 0.000000 | 0.000000 | 0 | 0 |
| end | 0.000005 | 0.000000 | 0.000000 | 0 | 0 |
| end | 0.000004 | 0.000000 | 0.000000 | 0 | 0 |
| query end | 0.000004 | 0.000000 | 0.000000 | 0 | 0 |
| freeing items | 0.000012 | 0.000000 | 0.000000 | 0 | 0 |
| closing tables | 0.000009 | 0.000000 | 0.000000 | 0 | 0 |
| logging slow query | 0.000003 | 0.000000 | 0.000000 | 0 | 0 |
| cleaning up | 0.000003 | 0.000000 | 0.000000 | 0 | 0 |
+———————-+———-+———-+————+————–+—————+
21 rows in set (0.00 sec)

From here we can see that The second part does't have the "Sorting result" step,so a little impact on performance.


The GROUP BY clause allows a WITH ROLLUP modifier that causes extra rows to be added to the summary output. These rows represent higher-level (or super-aggregate) summary operations. ROLLUP thus allows you to answer questions at multiple levels of analysis with a single query. It can be used, for example, to provide support for OLAP (Online Analytical Processing) operations.

Suppose that a table named sales has year, country, product, and profit columns for recording sales profitability:

CREATE TABLE sales ( year INT NOT NULL, country VARCHAR(20) NOT NULL, product VARCHAR(32) NOT NULL, profit INT );

The table's contents can be summarized per year with a simple GROUP BY like this:

mysql> SELECT year, SUM(profit) FROM sales GROUP BY year; +------+-------------+ | year | SUM(profit) | +------+-------------+ | 2000 | 4525 | | 2001 | 3010 | +------+-------------+

This output shows the total profit for each year, but if you also want to determine the total profit summed over all years, you must add up the individual values yourself or run an additional query.

Or you can use ROLLUP, which provides both levels of analysis with a single query. Adding a WITH ROLLUP modifier to the GROUP BY clause causes the query to produce another row that shows the grand total over all year values:

mysql> SELECT year, SUM(profit) FROM sales GROUP BY year WITH ROLLUP; +------+-------------+ | year | SUM(profit) | +------+-------------+ | 2000 | 4525 | | 2001 | 3010 | | NULL | 7535 | +------+-------------+

The grand total super-aggregate line is identified by the value NULL in the year column.

ROLLUP has a more complex effect when there are multiple GROUP BY columns. In this case, each time there is a “break” (change in value) in any but the last grouping column, the query produces an extra super-aggregate summary row.

For example, without ROLLUP, a summary on the sales table based on year, country, and product might look like this:

mysql> SELECT year, country, product, SUM(profit) -> FROM sales -> GROUP BY year, country, product; +------+---------+------------+-------------+ | year | country | product | SUM(profit) | +------+---------+------------+-------------+ | 2000 | Finland | Computer | 1500 | | 2000 | Finland | Phone | 100 | | 2000 | India | Calculator | 150 | | 2000 | India | Computer | 1200 | | 2000 | USA | Calculator | 75 | | 2000 | USA | Computer | 1500 | | 2001 | Finland | Phone | 10 | | 2001 | USA | Calculator | 50 | | 2001 | USA | Computer | 2700 | | 2001 | USA | TV | 250 | +------+---------+------------+-------------+

The output indicates summary values only at the year/country/product level of analysis. When ROLLUP is added, the query produces several extra rows:

mysql> SELECT year, country, product, SUM(profit) -> FROM sales -> GROUP BY year, country, product WITH ROLLUP; +------+---------+------------+-------------+ | year | country | product | SUM(profit) | +------+---------+------------+-------------+ | 2000 | Finland | Computer | 1500 | | 2000 | Finland | Phone | 100 | | 2000 | Finland | NULL | 1600 | | 2000 | India | Calculator | 150 | | 2000 | India | Computer | 1200 | | 2000 | India | NULL | 1350 | | 2000 | USA | Calculator | 75 | | 2000 | USA | Computer | 1500 | | 2000 | USA | NULL | 1575 | | 2000 | NULL | NULL | 4525 | | 2001 | Finland | Phone | 10 | | 2001 | Finland | NULL | 10 | | 2001 | USA | Calculator | 50 | | 2001 | USA | Computer | 2700 | | 2001 | USA | TV | 250 | | 2001 | USA | NULL | 3000 | | 2001 | NULL | NULL | 3010 | | NULL | NULL | NULL | 7535 | +------+---------+------------+-------------+

For this query, adding ROLLUP causes the output to include summary information at four levels of analysis, not just one. Here is how to interpret the ROLLUP output:

*

  Following each set of product rows for a given year and country, an extra summary row is produced showing the total for all products. These rows have the product column set to NULL.
*

  Following each set of rows for a given year, an extra summary row is produced showing the total for all countries and products. These rows have the country and products columns set to NULL.
*

  Finally, following all other rows, an extra summary row is produced showing the grand total for all years, countries, and products. This row has the year, country, and products columns set to NULL.

Other Considerations When using ROLLUP

The following items list some behaviors specific to the MySQL implementation of ROLLUP:

When you use ROLLUP, you cannot also use an ORDER BY clause to sort the results. In other words, ROLLUP and ORDER BY are mutually exclusive. However, you still have some control over sort order. GROUP BY in MySQL sorts results, and you can use explicit ASC and DESC keywords with columns named in the GROUP BY list to specify sort order for individual columns. (The higher-level summary rows added by ROLLUP still appear after the rows from which they are calculated, regardless of the sort order.)

LIMIT can be used to restrict the number of rows returned to the client. LIMIT is applied after ROLLUP, so the limit applies against the extra rows added by ROLLUP. For example:

mysql> SELECT year, country, product, SUM(profit) -> FROM sales -> GROUP BY year, country, product WITH ROLLUP -> LIMIT 5; +------+---------+------------+-------------+ | year | country | product | SUM(profit) | +------+---------+------------+-------------+ | 2000 | Finland | Computer | 1500 | | 2000 | Finland | Phone | 100 | | 2000 | Finland | NULL | 1600 | | 2000 | India | Calculator | 150 | | 2000 | India | Computer | 1200 | +------+---------+------------+-------------+

Using LIMIT with ROLLUP may produce results that are more difficult to interpret, because you have less context for understanding the super-aggregate rows.

The NULL indicators in each super-aggregate row are produced when the row is sent to the client. The server looks at the columns named in the GROUP BY clause following the leftmost one that has changed value. For any column in the result set with a name that is a lexical match to any of those names, its value is set to NULL. (If you specify grouping columns by column number, the server identifies which columns to set to NULL by number.)

Because the NULL values in the super-aggregate rows are placed into the result set at such a late stage in query processing, you cannot test them as NULL values within the query itself. For example, you cannot add HAVING product IS NULL to the query to eliminate from the output all but the super-aggregate rows.

On the other hand, the NULL values do appear as NULL on the client side and can be tested as such using any MySQL client programming interface.


Group by is grouping record by some column. For example you have column "class" and you can group by this column so you will get records grouped based on this column values.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号