I'm trying to figure out storage requirements for diffe开发者_如何学JAVArent storage engines. I have this table:
CREATE TABLE `mytest` (
`num1` int(10) unsigned NOT NULL,
KEY `key1` (`num1`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
When I insert some values and then run show table status;
I get the following:
+----------------+--------+---------+------------+---------+----------------+-------------+------------------+--------------+-----------+----------------+---------------------+---------------------+------------+-------------------+----------+----------------+---------+ | Name | Engine | Version | Row_format | Rows | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time | Update_time | Check_time | Collation | Checksum | Create_options | Comment | +----------------+--------+---------+------------+---------+----------------+-------------+------------------+--------------+-----------+----------------+---------------------+---------------------+------------+-------------------+----------+----------------+---------+ | mytest | InnoDB | 10 | Compact | 1932473 | 35 | 67715072 | 0 | 48840704 | 4194304 | NULL | 2010-05-26 11:30:40 | NULL | NULL | latin1_swedish_ci | NULL | | |
Notice avg_row_length is 35. I am baffled that InnoDB would not make better use of space when I'm just storing a non-nullable integer.
I have run this same test on myISAM and by default myISAM uses 7 bytes per row on this table. When I run
ALTER TABLE mytest MAX_ROWS=50000000, AVG_ROW_LENGTH = 4;
causes myISAM to finally correctly use 5-byte rows.
When I run the same ALTER TABLE statement for InnoDB the avg_row_length does not change.
Why would such a large avg_row_length be necessary when only storing a 4-byte unsigned int?
InnoDB
tables are clustered, that means that all data are contained in a B-Tree
with the PRIMARY KEY
as a key and all other columns as a payload.
Since you don't define an explicit PRIMARY KEY
, InnoDB
uses a hidden 6-byte column to sort the records on.
This and overhead of the B-Tree
organization (with extra non-leaf-level blocks) requires more space than sizeof(int) * num_rows
.
Here is some more info you might find useful.
InnoDB allocates data in terms of 16KB pages, so 'SHOW TABLE STATUS' will give inflated numbers for row size if you only have a few rows and the table is < 16K total. (For example, with 4 rows the average row size comes back as 4096.)
The extra 6 bytes per row for the "invisible" primary key is a crucial point when space is a big consideration. If your table is only one column, that's the ideal column to make the primary key, assuming the values in it are unique:
CREATE TABLE `mytest2`
(`num1` int(10) unsigned NOT NULL primary key)
ENGINE=InnoDB DEFAULT CHARSET=latin1;
By using a PRIMARY KEY like this:
- No INDEX or KEY clause is needed, because you don't have a secondary index. The index-organized format of InnoDB tables gives you fast lookup based on the primary key value for free.
- You don't wind up with another copy of the NUM1 column data, which is what happens when that column is indexed explicitly.
- You don't wind up with another copy of the 6-byte invisible primary key values. The primary key values are duplicated in each secondary index. (That's also the reason why you probably don't want 10 indexes on a table with 10 columns, and you probably don't want a primary key that combines several different columns or is a long string column.)
So overall, sticking with just a primary key means less data associated with the table + indexes. To get a sense of overall data size, I like to run with
set innodb_file_per_table = 1;
and examine the size of the data/database/*table*.ibd files. Each .ibd file contains the data for an InnoDB table and all its associated indexes.
To quickly build up a big table for testing, I usually run a statement like so:
insert into mytest
select * from mytest;
Which doubles the amount of data each time. In the case of the single-column table using a primary key, since the values had to be unique, I used a variation to keep the values from colliding with each other:
insert into mytest2
select num1 + (select count(*) from mytest2) from mytest2;
This way, I was able to get average row size down to 25. The space overhead is based on the underlying assumption that you want to have fast lookup for individual rows using a pointer-style mechanism, and most tables will have a column whose values serve as pointers (i.e. the primary key) in addition to the columns with real data that gets summed, averaged, and displayed.
IN addition to Quassnoi's very fine answer, you should probably try it out using a significant data set.
What I'd do is, load 1M rows of simulated production data in, then measure the table size and use that as a guide.
That's what I've done in the past anyway
MyISAM
MyISAM, except in really old versions, uses a 7-byte "pointer" for locating a row, and a 6-byte pointer inside indexes. These defaults lead to a huge max table size. More details: http://mysql.rjweb.org/doc.php/limits#myisam_specific_limits . The kludgy way to change those involves the ALTER .. MAX_ROWS=50000000, AVG_ROW_LENGTH = 4
that you discovered. The server multiplies those values together to compute how many bytes the data pointer needs to be. Hence, you stumbled on how to shrink the avg_row_length.
But you actually needed to declare a table with fewer than 7 bytes to hit it! The pointer size shows in multiple places:
Free space links in the .MYD default to 7 bytes. So, when you delete a row, a link is provided to the next free spot. That link needs to be 7 bytes (by default), hence the row size was artificially extended from the 4-byte INT to make room for it! (There are more details having to do with whether the column is NULLable , etc.
FIXED vs DYNAMIC row -- When the table is FIXED size, the "pointer" is a row number. For DYNAMIC, it is a byte offset into the .MYD.
Index entries must also point to data rows with a pointer. So your
ALTER
should have shrunk the .MYI file as well!
There are more details, but MyISAM is likely to go away, so this ancient history is not likely to be of concern to anyone.
InnoDB
https://stackoverflow.com/a/64417275/1766831
精彩评论