开发者

Why use shorter VARCHAR(n) fields?

开发者 https://www.devze.com 2023-01-02 23:12 出处:网络
It is frequently advised to choose database field sizes to be as narrow as possible. I am wondering to what degree this applies to SQL Server 2005 VARCHAR columns: Storing 10-letter English words in a

It is frequently advised to choose database field sizes to be as narrow as possible. I am wondering to what degree this applies to SQL Server 2005 VARCHAR columns: Storing 10-letter English words in a VARC开发者_如何学GoHAR(255) field will not take up more storage than in a VARCHAR(10) field.

Are there other reasons to restrict the size of VARCHAR fields to stick as closely as possible to the size of the data? I'm thinking of

  • Performance: Is there an advantage to using a smaller n when selecting, filtering and sorting on the data?
  • Memory, including on the application side (C++)?
  • Style/validation: How important do you consider restricting colunm size to force non-sensical data imports to fail (such as 200-character surnames)?
  • Anything else?

Background: I help data integrators with the design of data flows into a database-backed system. They have to use an API that restricts their choice of data types. For character data, only VARCHAR(n) with n <= 255 is available; CHAR, NCHAR, NVARCHAR and TEXT are not. We're trying to lay down some "good practices" rules, and the question has come up if there is a real detriment to using VARCHAR(255) even for data where real maximum sizes will never exceed 30 bytes or so.

Typical data volumes for one table are 1-10 Mio records with up to 150 attributes. Query performance (SELECT, with frequently extensive WHERE clauses) and application-side retrieval performance are paramount.


  1. Data Integrity - By far the most important reason. If you create a column called Surname that is 255 characters, you will likely get more than surnames. You'll get first name, last name, middle name. You'll get their favorite pet. You'll get "Alice in the Accounting Department with the Triangle hair". In short, you will make it easy for users to use the column as a notes/surname column. You want the cap to imped the users that try to put something other than a surname into that column. If you have a column that calls for a specific length (e.g. a US tax identifier is nine characters) but the column is varchar(255), other developers will wonder what is going on and you likely get crap data as well.

  2. Indexing and row limits. In SQL Server you have a limit of 8060 bytes IIRC. Lots of fat non-varchar(max) columns with lots of data can quickly exceed that limit. In addition, indexes have a 900 bytes cap in width IIRC. So, if you wanted to index on your surname column and some others that contain lots of data, you could exceed this limit.

  3. Reporting and external systems. As a report designer you must assume that if a column is declared with a max length of 255, it could have 255 characters. If the user can do it, they will do it. Thus, to say, "It probably won't have more than 30 characters." is not even remotely the same as "It cannot have more than 30 characters." Never rely on the former. As a report designer, you have to work around the possibilities that users will enter a bunch of data into a column. That either means truncating the values (and if that is the case why have the additional space available?) or using CanGrow to make a lovely mess of a report. Either way, you make it harder on other developers to understand the intent of the column if the column size is so far out of whack with the actual data being stored.


I think that the biggest issue is data validation. If you allow 255 characters for a surname, you WILL get a surname that's 200+ characters in your database.

Another reason is that if you allow the database to hold 255 characters you now have to account for that possibility in every system that touches your database. For example, if you exported to a fixed-width column file all of your columns would have to be 255 characters wide, which could be pretty annoying or even problematic. That's just one example where it could cause a problem.


One good reason is validation.

(for example) In Holland a social security number is always 9 chars long, when you won't allow more it will never occur.

If you would allow more and for some unknown reason there are 10 chars, you will need to put in checks (which you otherwise wouldn't) to check if it is 9 long.


1) Readability & Support

A database developer could look at a field called StateCode with a length of varchar(2) and get a good idea of what kind of data that field holds, without even looking at the contents.

2) Reporting

When you data is without a length constraint, you are expecting the developer to enforce that the column data is all similar in length. When reporting on that data, if the developer has failed to make the column data consistent, that will make the reporting that data inconsistent & look funny.

3) SQL Server Data Storage

SQL Server stores data on 8k "pages" and from a performance standpoint it is ideal to be as efficient as possible and store as much data as possible on a page.

If your database is designed to store every string column as varchar(255), "bad" data could slip into one of those fields (for example a state name might slip into a StateCode field that is meant to be 2 characters long), and cause unecessary & inefficient page and index splits.


The other thing is that a single row of data is limited to 8060 bytes, and SQL Server uses the max length of varchar fields to determine this.

Reference: http://msdn.microsoft.com/en-us/library/ms143432.aspx

0

精彩评论

暂无评论...
验证码 换一张
取 消