I have a varchar(384) to store emails in a MyISAM table and I want check when the email exists 开发者_Python百科but the index length limit is 333 bytes (because I'm using utf8 (1000 bytes/3 bytes)).
Then what's the best way to search if a specified email exists, using a FULLTEXT index or creating another field with the email md5-hash in a BINARY(16) (with collisions check)?
[EDIT]
Tests using 1M records.
Fulltext index: ~300 ms
333 length index: ~15 ms
binary(16) md5-hash: ~15 ms
So I think that the best option is the second field in performance, but... second field = bigger table, and that's not good for performance or storage. So in a real scenary where emails aren't bigger than 150 characters, a 150 length index in a VARCHAR(384) will be enough.
In the last week, I have added 100,000 distinct emails to my database (slow week).
The longest was 45 characters long. The bell curve peaks at 21-22 characters.
So, store up to 384 if you'd like, but only index the first 45 characters. Even in the rare case when the index returns duplicates, the extra I/O to read the underlying records off disk won't kill you.
Good luck.
精彩评论