For example if I have a table User, I want to store gender or sex, I'll add开发者_如何学JAVA a column like sex
.
Is it really worth to use an integer and then map it in my favorite programming language?
Like 1 => 'Male'
and 2 => 'Female'
Is there any performance reason to do that?
Or could I safely use a varchar which more meaning with 'female' or 'male' almost like I was using mysql ENUM
?
Edit: I here and there that it is sometimes better, sometimes it doesn't matter, so I more looking for benchmark or something over a "it is better" answer.
I mean I think using varchar is actually more meaningfull than an integer, and I would use an integer only if performance are more than 0.3% or something.
If this is for some homebrew website or application that will serve 10 people, then do whatever you want, performance won't make a difference.
If this is for something real then skip rolling your own implementation of gender and follow the ISO standard for sex. Or at least adhere to standards wherever they exist (thanks Joe Celko!)
0 = not known
1 = male
2 = female
9 = not applicable
Always rightsize your data type
Disk space savings:
At my last job, the pedantic people in charge of designing tables created a column as decimal with 0 precision because it should only be N digits. The difference in storage cost between that and a whole number data type was 1 or 2 bytes. However, as this table was very large the aggregate cost savings of having the smaller data type was measure in gigabytes on the table alone.Access savings: A second cost that most don't think about is the cost to read information from disk or to keep data in memory. In SQL Server, data is stored in 8K pages. If you are using fat data types, it will take more reads to get data off disk and then you can store subsequently fewer data pages in memory. Pulling data off of disk is where you will incur the biggest performance cost. If you want to speed up things that use a database, don't bone the physical implementation.
Implement as the smallest allowable type in your system that will cover the problem domain. For something like gender, use a tinyint (MySQL, SQL Server) or number(5,0) in Oracle and you'll be spending 1 bye of storage for each gender.
Internationlization
M = Male, F = Female, that seems obvious. ¿Verdad? Aqui, nosotros hablamos español. And that's about as far as my Spanish caries me, but my point is that if you ever need to be multi-lingual, 1 will serve males, gentes, mannlich, masculin, etc. M or Male will only serve an English speaking audience. Further more, then you run into weird presentation logic of "We need to translate everything unless it's going to $culture." It is a far cleaner design to have presentation logic is the UI and keep it out of the database.
Ortiginal Answer:
I would suggest storing it in a CHAR(1)
column as M
or F
It is expressive enough for the specific purpose AND has the speed benefit of being a single character comparison
Update 4 (fixed benchmark):
All previous benchmarks had a fatal flaw that one (the CHAR(1)
) table was MyISAM
and all other were InnoDB
. So I recreated the database with all tables using the MyISAM
and the results make much more sense now.
The error creeped in as I used the MySQLWorkbench's wizard to create the tables and forgot to change the database engine in the other tables and it defaulted to InnoDB
(I have MySQL 5.5)
So the corrected results are as follows, (I have removed all my previous benchmarks as they were invalid) :
// select queries
$query['char'] = "select count(*) from test_table where gender = 'M'";
$query['char_lower'] = "select count(*) from test_table where LOWER(gender) = 'm'";
$query['varchar'] = "select count(*) from test_table_2 where gender = 'Male'";
$query['varchar_lower'] = "select count(*) from test_table_2 where LOWER(gender) = 'male'";
$query['tinyint'] = "select count(*) from test_table_3 where gender = 1";
// benchmark result
array
'char' => float 0.35457420349121
'char_lower' => float 0.44702696800232
'varchar' => float 0.50844311714172
'varchar_lower' => float 0.64412498474121
'tinyint' => float 0.26296806335449
New Conclusion : TINYINT
Is fastest. But my recommendation would be still yo use CHAR(1)
as it would be easier for future developers to understand the database.
If you do use TINYINT
, my recommendation would be name the column ismale
instead of sex
and store 0 => Female
and 1 => male
thus making it a little more easy to understand in raw database.
The table structure for benchmark is this:
CREATE TABLE `test_table` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`gender` char(1) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM
Only the type of the gender column is different in the 3 tables, the types are:
CHAR(1), VARCHAR(6), TINYINT
All 3 tables have 10000
entries.
The benefit of storing as varchar
is that the data can mostly speak for itself - however, it ends there and only manifests itself in queries against the raw data which will usually be done by a developer that knows the system anyway (exposing data querying functionality to users or others would generally use an application layer, which means you could format it as desired regardless.) And this data is OK to display, but consider having to constantly parse it!
As for storing as an integer, it is a little obfuscated, but so long as it is in the data specification and mappings laid out clearly, then you reap benefits of using the data more productively in your application (using a mapping of map an integer
to an enum
is a one off thing and exposes a more usable type in terms or branching logic, removing string parsing.) It will also be more efficient than storing strings.
There is of course the route of storing 'options' in a dedicated table and having other table fields reference it, but what I've found in many projects is that this is far from ideal in terms of utilisation, unless still using mappable types - which then the table only serves to obscure things a little more, potentially.
It will be much faster than doing a string comparison, if you are doing any SELECTS
on it.
SELECT * FROM User WHERE Gender = 'female'
Example:
Say I have female
as a string. Its 6 characters long. So it has to do a comparison 6 times for every record, and that is using strict casing - it gets more costly to do case insensitive.
Now say I have 123456
as an int. Its one value, not 6 to compare, even though the human readable string is 6 characters long.
Aside
Ideally, Male
and Female
would be another table and your User
table would have a FK to that table.
Integer are much faster than doing string comparison, but I think your better of using chars 'M' or 'F'. If people dump the table they'll know exactly what you intended and its better than maintaining a join table. Unless we're going to be running across new sexes soon.
it depends.. but generally yes.
ints take up less space on disk.
ints compare faster
ints travel over the network faster (smaller)
so if it is one row only, and you query it once a day - you'll never notice, but in general, you will benefit.
This is a non-brainer: use ISO 5218 values. Why reinvent the wheel and make your locale-specific and less portable?
Because the set of values is small and stable, you can get away with using a CHECK
constraint... oops, I mean, for MySQL create a lookup table with a foreign key!
精彩评论