开发者

SQL SELECT speed int vs varchar

开发者 https://www.devze.com 2022-12-21 05:24 出处:网络
I\'m in the process of creating a table and it made me wonder. If I store, say cars that has a make (fx BMW, Audi ect.), will it make any difference on the query speed if I store the make as an int o

I'm in the process of creating a table and it made me wonder.

If I store, say cars that has a make (fx BMW, Audi ect.), will it make any difference on the query speed if I store the make as an int or varchar.

So is

SELECT * FROM table WHERE make = 5 AND ...;

Faster/slower than

SELECT * FROM table WHERE make = 'audi' AND ...;

or will the speed be more or less the same?开发者_JS百科


Int comparisons are faster than varchar comparisons, for the simple fact that ints take up much less space than varchars.

This holds true both for unindexed and indexed access. The fastest way to go is an indexed int column.


As I see you've tagged the question postgreql, you might be interested in the space usage of different date types:

  • int fields occupy between 2 and 8 bytes, with 4 being usually more than enough ( -2147483648 to +2147483647 )
  • character types occupy 4 bytes plus the actual strings.


Some rough benchmarks:

4 million records in Postgres 9.x

Table A = base table with some columns
Table B = Table A + extra column id of type bigint with random numbers
Table C = Table A + extra column id of type text with random 16-char ASCII strings

Results on 8GB RAM, i7, SSD laptop:

Size on disk:                A=261MB        B=292MB        C=322MB
Non-indexed by id: select count(*), select by id: 450ms same on all tables
Insert* one row per TX:       B=9ms/record        C=9ms/record
Bulk insert* in single TX:    B=140usec/record    C=180usec/record
Indexed by id, select by id:  B=about 200us       C=about 200us

* inserts to the table already containing 4M records

so it looks like for this setup, as long as your indexes fit in RAM, bigint vs 16-char text makes no difference in speed.


It will be a bit faster using an int instead of a varchar. More important for speed is to have an index on the field that the query can use to find the records.

There is another reason to use an int, and that is to normalise the database. Instead of having the text 'Mercedes-Benz' stored thousands of times in the table, you should store it's id and have the brand name stored once in a separate table.


Breaking down to the actual performance of string comparison versus non-floats, in this case any size unsigned and signed does not matter. Size is actually the true difference in performance. Be it 1byte+(up to 126bytes) versus 1,2,4 or 8 byte comparison... obviously non-float are smaller than strings and floats, and thus more CPU friendly in assembly.

String to string comparison in all languages is slower than something that can be compared in 1 instruction by the CPU. Even comparing 8 byte (64bit) on a 32bit CPU is still faster than a VARCHAR(2) or larger. * Again, look at the produced assembly (even by hand) it takes more instructions to compare char by char than 1 to 8 byte CPU numeric.

Now, how much faster? depends also upon the volume of data. If you are simply comparing 5 to 'audi' - and that is all your DB has, the resulting difference is so minimal you would never see it. Depending upon CPU, implementation (client/server, web/script, etc) you probably will not see it until you hit few hundred comparisons on the DB server (maybe even a couple thousand comparisons before it is noticeable).

  • To void the incorrect dispute about hash comparisons. Most hashing algorithms themselves are slow, so you do not benefit from things like CRC64 and smaller. For over 12 years I developed search algorithms for multi-county search engines and 7 years for the credit bureaus. Anything you can keep in numeric the faster... for example phone numbers, zip codes, even currency * 1000 (storage) currency div 1000 (retrieval) is faster than DECIMAL for comparisons.

Ozz


Index or not, int is a lot faster (the longer the varchar, the slower it gets).

Another reason: index on varchar field will be much larger than on int. For larger tables it may mean hundreds of megabytes (and thousands of pages). That makes the performance much worse as reading the index alone requires many disk reads.


In general the int will be faster. The longer is the varchar the slower it gets


Hint: If the possible values for the field make will never (or rarely) change, you can use ENUM as a compromise. It combines good speed with good readability.


If you turn on indexing on either of the fields, it will be faster. As for your question, i think intis faster than varchar.


Somewhat relative. Yes, INTs will be faster, but the question is if it is noticeable in your situation. Are the VARCHARs just some small words, or longer texts? and how many rows are in the table? If there are just a few rows it will most likely be entirely buffered in memory (when requested often), in that case you wont notice much difference. Then of course there is indexing, which gets more important when the table grows. Using SSD's might be faster then HD's with optimized queries. Also good disk-controllers sometimes speed up queries >10x . This might leave room for just using VARCHARs which makes reading and writing queries easier (no need to write complex joins) and speed up development. Purists however will disagree and always normalize everything.

0

精彩评论

暂无评论...
验证码 换一张
取 消