the project I'm working on has for each column that needs to be searched a second column called "ft[columnname]" which has a FULLTEXT index and only this one is searched against.
This column contains an "optimized" text, that is automatically generated from the original column in开发者_运维技巧 the following way:
- The string is lowercased
- All accents are removed
- All punctuations and unsearchable characters are removed
- All duplicated words are removed
- All words are sorted from the longest to the shortest
- Other transformations that I don't really understand (related to combined-words)
For example "I like Pokémon, especially Pikachu!" becomes "especially pokemon pikachu like i".
Is there any (even a very tiny one) performance benefit? The data in the database never dynamically changes.
There might be a functionality benefit for your specific application, but storing the data in duplicate is largely a performance hit -- not a benefit.
Since your data is now twice as big, assuming a sufficiently large data set, only half as much can be held in the various levels of caching (e.g. MySQL, OS), so you're going to be reading from disk much more, which is the normal bottleneck.
Having said that, if you use single-byte character set on the ft indexed column, but a multi-byte character set on the original text, your full text index may be much smaller than it would have been otherwise.
Honestly, you should not do it in 2nd column because by doing so, it implies you are using MyISAM storage engine for a production table. (or go ahead if you can afford to lose some data).
In fact, you do care about the performance, so you should consider using a capable full text search engine such as Sphinx: http://www.sphinxsearch.com/
精彩评论