开发者

Importance of varchar length in MySQL table

开发者 https://www.devze.com 2022-12-15 00:45 出处:网络
I have a MySQL table where rows are inserted dynamically.Because I can not be certain of the length of strings and do not want them cut off, I make them varchar(200) which is general开发者_运维技巧ly

I have a MySQL table where rows are inserted dynamically. Because I can not be certain of the length of strings and do not want them cut off, I make them varchar(200) which is general开发者_运维技巧ly much bigger than I need. Is there a big performance hit in giving a varchar field much more length than necessary?


There's one possible performance impact: in MySQL, temporary tables and MEMORY tables store a VARCHAR column as a fixed-length column, padded out to its maximum length. If you design VARCHAR columns much larger than the greatest size you need, you will consume more memory than you have to. This affects cache efficiency, sorting speed, etc.


No, in the sense that if the values you're storing in that column are always (say) less than 50 characters, declaring the column as varchar(50) or varchar(200) has the same performance.


VARCHAR is ideal for the situation you describe, because it stands for "variable character" - the limit, based on your example, would be 200 characters but anything less is accepted and won't fill the allotted size of the column.

VARCHAR also take less space - the values are stored as a one-byte or two-byte length prefix plus data. The length prefix indicates the number of bytes in the value. A column uses one length byte if values require no more than 255 bytes, two length bytes if values may require more than 255 bytes.

For more information comparing the MySQL CHAR to VARCHAR datatypes, see this link.


Size is performance! The smaller the size, the better. Not today or tomorrow, but some day your tables will be grown to a size when it comes to serious bottlenecks, no matter what design you laid out. But you can foresee some of those potential bottlenecks in your design phase that are likely to happen first and try to expand the time your db will perform fast and happily until you need to rethink your scheme or scale horizontally by adding more servers.

In your case there are many performance leaks you can run into: Big joins are nearly impossible with long varchar columns. Indexing on those columns are a real killer. Your disk has to store the data. One memory page can hold less rows and table scans will be much slower. Also the query cache will be unlikely to help you here.

You have to ask yourself: How many inserts per year may happen? What is the average length? Do I really need more than 200 characters or can I catch that in my application front-end, even by informing users about the maximum length? Can I split up the table into a narrow one for fast indexing and scanning and another one for holding additional, less frequently needed data of expanding size? Can I type the possible varchar data into categories and so extract some of the data into a few smaller, maybe int or bool-type columns and narrow the varchar column that way?

You can do a lot here. It may be best to go with a first assumption and then re-design step by step using real-life measured performance data. Good luck.


Some of you are mistaken thinking that a varchar(200) takes up more table size on disk than a varchar(20). This is not the case. Only when you go beyond 255 chars does mysql use an extra byte to determine the length of the varchar field data.


Performance? No. Disk storage? Yes, but it's cheap and plentiful. Unless your database will grow to terabyte scale you're probably okay.


There can be performance hits - but usually not on a level that most users would notice.

When the size of each field is known in advance, MySQL knows exactly how many bytes are between each field/row and can page forward without reading all the data. Using variable characters diminshes this ability for optimization.

Does varchar result in performance hit due to data fragmentation?

Even better, char vs varchar.

For most uses, you'll be fine with either - but there is a difference, and for large scale databases, there are reasons why you'd pick one or the other.


You should try to view a varchar column the same as you would a char column in most scenarios and set the length conservatively. You don't have to always think of var modifier so much as something that impacts your decision making on the maximum length. It really should be seen as a performance hint instead that the strings supplied will be of varying lengths.

It's not a directive that has to be strictly followed by database internals, it can be completely ignored. Do take care with this however as sometimes implementation can leak (fixed length and padding for example) even though it shouldn't in an ideal world.

If you have a varchar(255) then you have no guarantee that performance wise it's always going to behave any differently to a char(255) in all circumstance.

It can seem easy to set it at something such as 255, 65535, etc inline with the advice given in the manual about storage requirements. This gives the impression that any value between 0 (yes, it's a thing) and 255 will have the same impact. However that's not something that can be fully guaranteed.

Storage requirements do tend to be true or a good indicator for decent and mature persistent storage engines in terms of row storage. It isn't as strong an indicator for things such as indexes.

It's sometimes a difficult question, exactly how long should a piece of string be so setting it up to the highest bound you know it should be within but that has no impact. Unfortunately this is often something left to the user to work out and it's really somewhat arbitrary. You can't really say never oversize a string because there maybe cases where you're not exactly sure.

You should ensure that MySQL queries throw an error when a string is too long rather than truncate so that at least you know if it might be too short from error emissions. Resizing columns to enlarge or shrink them can be an expensive DDL operation, this should be kept in mind.

Character set should also be considered where the length and performance comes into play. The length refers to this rather than bytes. If using utf8 for example, (not MB4) then varchar(255) is really varbinary(3 * 255). It's hard to know how things like this will really play out without running tests and looking deeply into source code/documentation. Because of this there is scope for excessive length to have an unexpectedly inflated impact. this doesn't only apply to performance. If you one day need to change the character set of a varchar column to a larger one then you might end up hitting some limit with no recourse if you allowed gratuitously long strings to be present that could have been avoided. This is normally a fairly niche problem but it does come up, it was recently a significant problem with the introduction of utf8mb4 for MySQL and indexes which have a limit on key length.

If it turns out that MAX(LENGTH(column)) is always < 64 (such as if it was decided there would be a limit on input that wasn't matched by the column definition) but you have varchar(255) then there's a good chance that you'll be using four times more space than needed in some scenarios.

This might include:

  • Different engines, some may ignore it altogether.
  • Buffer sizes, for example update or insert might have to allocate the full 255 (although I have not checked the source code to prove this, it is only a hypothetical).
  • Indexes, this will be immediately obvious if you try to make a composite key from a lot of varchar(255) columns.
  • Intermediate tables and possibly result sets. Given the way transactions work, it might not always be possible for something to use the actual max length of strings in a column as opposed to the defined limit.
  • Internal predictive optimisations might take the max length as an input.
  • Changes in database implementation versions.

As a rule of thumb there's really no need for a varchar to be longer than it needs to be anyway, performance issues or not so I recommend sticking to that when you can. Taking more effort to sample the size of your data, enforce a true limit or find out the true limit through asking/research is the ideal approach.

When you can't, if you want to do something such as varchar(255) for cases when in doubt then I recommend doing the science. This might consist of duplicating the table, reducing the size of the var char column then copying the data into it from the original and looking at the size of index/row data (index the column as well, also try it as a primary key which might behave differently in InnoDB as rows are ordered by primary key). At the very least this way you'll know if you have an impact on IO which tends to be one of the most sensitive bottlenecks. Testing for memory usage is more difficult, it's hard to test that exhaustively. I would recommend testing potential worst cases (queries with lots of intermediate in memory results, check with explain for large temp tables, etc).

If you know there's not going to be many rows in the table, you aren't going to use the column for joins, indexes (especially composite, unique), etc then you most likely wont have many problems.


Being varchar, rather than just char, the size is based on an internal field to indicate its actual length and the string itself. So using varchar(200) is not very different to using varchar(150), except that you have the potential to store more.

And you should consider what happens on an update, when a row grows. But if this is rare, then you should be fine.


as per the datatype name suggests this is VARCHAR i.e. variable chars data storage, mysql engine itself allocates the memory being uses as per the stored data, so there is no performance hit as per my knowledge.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号