I know you can ALTER the column order in MySQL with FIRST and AFTER, but why would you w开发者_如何转开发ant to bother? Since good queries explicitly name columns when inserting data, is there really any reason to care what order your columns are in in the table?
Column order had a big performance impact on some of the databases I've tuned, spanning Sql Server, Oracle, and MySQL. This post has good rules of thumb:
- Primary key columns first
- Foreign key columns next.
- Frequently searched columns next
- Frequently updated columns later
- Nullable columns last.
- Least used nullable columns after more frequently used nullable columns
An example for difference in performance is an Index lookup. The database engine finds a row based on some conditions in the index, and gets back a row address. Now say you are looking for SomeValue, and it's in this table:
SomeId int,
SomeString varchar(100),
SomeValue int
The engine has to guess where SomeValue starts, because SomeString has an unknown length. However, if you change the order to:
SomeId int,
SomeValue int,
SomeString varchar(100)
Now the engine knows that SomeValue can be found 4 bytes after the start of the row. So column order can have a considerable performance impact.
EDIT: Sql Server 2005 stores fixed-length fields at the start of the row. And each row has a reference to the start of a varchar. This completely negates the effect I've listed above. So for recent databases, column order no longer has any impact.
Update:
In MySQL
, there may be a reason to do this.
Since variable datatypes (like VARCHAR
) are stored with variable lengths in InnoDB
, the database engine should traverse all previous columns in each row to find out the offset of the given one.
The impact may be as big as 17% for 20
columns.
See this entry in my blog for more detail:
- Choosing column order
In Oracle
, trailing NULL
columns consume no space, that's why you should always put them to the end of the table.
Also in Oracle
and in SQL Server
, in case of a large row, a ROW CHAINING
may occur.
ROW CHANING
is splitting a row that doesn't fit into one block and spanning it over the multiple blocks, connected with a linked list.
Reading trailing columns that didn't fit into the first block will require traversing the linked list, which will result in an extra I/O
operation.
See this page for illustration of ROW CHAINING
in Oracle
:
That's why you should put columns you often use to the beginning of the table, and columns you don't use often, or columns that tend to be NULL
, to the end of the table.
Important note:
If you like this answer and want to vote for it, please also vote for @Andomar
's answer.
He answered the same thing, but seems to be downvoted for no reason.
During Oracle training at a previous job, our DBA suggested that putting all the non-nullable columns before the nullable ones was advantageous... although TBH I don't remember the details of why. Or maybe it was just the ones that were likely to get updated should go at the end? (Maybe puts off having to move the row if it expands)
In general, it shouldn't make any difference. As you say, queries should always specify columns themselves rather than relying on the ordering from "select *". I don't know of any DB that allows them to be changed... well, I didn't know MySQL allowed it until you mentioned it.
Readability of the output when you have to type:
select * from <table>
in your database management software?
It's a very spurious reason, but at the moment I can't think of anything else.
Some badly-written applications might be dependent on column order / index instead of column name. They shouldn't be, but it does happen. Changing the order of the columns would break such applications.
No, the order of the columns in a SQL database table is totally irrelevant - except for display / printing purposes. There's no point in reordering columns - most systems don't even provide a way to do that (except dropping the old table and recreating it with the new column order).
Marc
EDIT: from the Wikipedia entry on relational database, here's the relevant portion which to me clearly shows that column order should never be of concern:
A relation is defined as a set of n-tuples. In both mathematics and the relational database model, a set is an unordered collection of items, although some DBMSs impose an order to their data. In mathematics, a tuple has an order, and allows for duplication. E.F. Codd originally defined tuples using this mathematical definition. Later, it was one of E.F. Codd's great insights that using attribute names instead of an ordering would be so much more convenient (in general) in a computer language based on relations. This insight is still being used today.
Beyond the obvious performance tuning, I just ran into a corner case where reordering columns caused a (previously functional) sql script to fail.
From the documentation "TIMESTAMP and DATETIME columns have no automatic properties unless they are specified explicitly, with this exception: By default, the first TIMESTAMP column has both DEFAULT CURRENT_TIMESTAMP and ON UPDATE CURRENT_TIMESTAMP if neither is specified explicitly" https://dev.mysql.com/doc/refman/5.6/en/timestamp-initialization.html
So, a command ALTER TABLE table_name MODIFY field_name timestamp(6) NOT NULL;
will work if that field is the first timestamp (or datetime) in a table, but not otherwise.
Obviously, you can correct that alter command to include a default value, but the fact that a query that worked stopped working due to a column reordering made my head hurt.
As is often the case, the biggest factor is the next guy who has to work on the system. I try to have the primary key columns first, the foreign key columns second, and then the rest of the columns in descending order of importance / significance to the system.
The only reason I can think about is for debugging and fire-fighting. We have a table whose "name" column's appears about 10th on the list. It's a pain when you do a quick select * from table where id in (1,2,3) and then you have to scroll across to look at the names.
But that's about it.
In 2002, Bill Thorsteinson posted on the Hewlett Packard forums his suggestions for optimizing MySQL queries by reordering the columns. His post has since been literally copied and pasted at least a hundred times on the Internet, often without citation. To quote him exactly...
General rules of thumb:
- Primary key columns first.
- Foreign key columns next.
- Frequently-searched columns next.
- Frequently-updated columns later.
- Nullable columns last.
- Least-used nullable columns after more-frequently used nullable columns.
- Blobs in own table with few other columns.
Source: HP Forums.
But that post was made all the back in 2002! This advice was for MySQL version 3.23, more than six years before MySQL 5.1 would be released. And there are no references or citations. So, was Bill right? And how exactly does the storage engine work at this level?
- Yes, Bill was right.
- It all comes down to a matter of chained rows and memory blocks.
To quote Martin Zahn, an Oracle-certified professional, in an article on The Secrets of Oracle Row Chaining and Migration...
Chained rows affect us differently. Here, it depends on the data we need. If we had a row with two columns that was spread over two blocks, the query:
SELECT column1 FROM table
where column1 is in Block 1, would not cause any «table fetch continued row». It would not actually have to get column2, it would not follow the chained row all of the way out. On the other hand, if we ask for:
SELECT column2 FROM table
and column2 is in Block 2 due to row chaining, then you would in fact see a «table fetch continued row»
The rest of the article is a rather good read! But I am only quoting the part here that is directly relevant to our question at hand.
More than 18 years later, I gotta say it: thanks, Bill!
If you're going to be using UNION a lot, it makes matching columns easier if you have a convention about their ordering.
As noted, there are numerous potential performance issues. I once worked on a database where putting very large columns at the end improved performance if you didn't reference those columns in your query. Apparently if a record spanned multiple disk blocks, the database engine could stop reading blocks once it got all the columns it needed.
Of course any performance implications are highly dependent not just on the manufacturer that you're using, but also potentially on the version. A few months ago I noticed that our Postgres could not use an index for a "like" comparison. That is, if you wrote "somecolumn like 'M%'", it wasn't smart enough to skip to the M's and quit when it found the first N. I was planning to change a bunch of queries to use "between". Then we got a new version of Postgres and it handled the like's intelligently. Glad I never got around to changing the queries. Obviously not directly relevant here but my point is that anything you do for efficiency considerations could be obsolete with the next version.
Column order is almost always very relevant to me because I routinely write generic code that reads the database schema to create screens. Like, my "edit a record" screens are almost always built by reading the schema to get the list of fields, and then displaying them in order. If I changed the order of columns, my program would still work, but the display might be strange to the user. Like, you expect to see name / address / city / state / zip, not city / address / zip / name / state. Sure, I could put the display order of the columns in code or a control file or something, but then every time we added or removed a column we'd have to remember to go update the control file. I like to say things once. Also, when the edit screen is built purely from the schema, adding a new table can mean writing zero lines of code to create an edit screen for it, which is way cool. (Well, okay, in practice usually I have to add an entry to the menu to call the generic edit program, and I've generally given up on generic "select a record to update" because there are too many exceptions to make it practical.)
The only time you'll need to worry about column order is if your software specifically relies on that order. Typically this is due to the fact that the developer got lazy and did a select *
and then referred to the columns by index rather than by name in their result.
In general what happens in SQL Server when you change column order through Management Studio, is that it creates a temp table with the new structure, moves the data to that structure from the old table, drops the old table and renames the new one. As you might imagine, this is a very poor choice for performance if you have a large table. I don't know if My SQL does the same, but it is one reason why many of us avoid reordering columns. Since select * should never be used in a production system, adding columns at the end is not aproblem for a well-designed system. Order of columns inthe table should in genral not be messed with.
精彩评论