开发者

Improving search performance

开发者 https://www.devze.com 2022-12-11 22:22 出处:网络
In one of my MySQL tables, I have following 开发者_高级运维columns: Skillsvarchar(80) Industryvarchar(40)

In one of my MySQL tables, I have following 开发者_高级运维columns:

Skills      varchar(80)
Industry    varchar(40)
Address     varchar(100)

Skills might include text like: C/C++, MS Office, Linux etc. Industry might include text like: Finance, IT, etc. Address contains complete postal address along with city name. There is no separate city column.

On my web page I have Search boxes where in user can type keywords for each of the columns. I use SQL query as given below:

Select studentname where skills like '%...%';

I want to improve the search results as well as performance. For example, the user might have entered a skill like Linex instead of Linux. So, my search should return the approx. match.

Please let me know how I can make search more accurate and increase performance. At present, I have defined index on skills, address and industry columns. But the performance is not quite well.

The table engine is InnoDB.

edit

The problem is that we receive pre-filled MS Excel sheets from various institutions. Our .NET application reads the column values of the Excel sheet and stores it in the remote database. For splitting tables, we need change in our application.


The SQL clause

like '%...%'

is the single most destructive thing you can do if you want performance from your database.

What you really should be doing is making sure that things like skills, industries and so on, are broken out into other tables with fixed values (like 'C', 'C++', 'SQL' and so on).

Then have a many-to-many table between person and skills. For example:

People:
    PersonId primary key.
    Other person details.
Skills:
    SkillId primary key.
    SkillName.
    Other skill details.
PeopleSkills:
    PersonId references People(PersonId).
    SkillId references Skills(SkillId).
    primary key (PersonId,SkillId).
    index on (SkillId).

This sort of layout will both improve the speed of your queries massively and make incorrect data entry impossible if you only allow entry of search terms from the Skills table (no 'Linex' possible where you meant 'Linux', simply because 'Linex' isn't in the skills table).

The one unassailable rule I follow with table design is: if you're trying to extract a bit of information from within a column, that information should be put in its own column. The number of performance issues people suffer because they created tables with a single column holding comma-separated values (where they want to extract individual values from that column) should be testament to that.

The downside of having to ensure all skills and industries are in a separate table will be more than made up for by the increased speed and accuracy. Databases should be always designed for third normal form. They can be regressed to 2NF for performance reasons if you understand the consequences (and mitigate the possibility of incorrect data by using triggers or calculated columns) but this is rarely necessary.


I'd recommend you to use Fulltext search with indice. InnoDB doesn't support the fulltext, but you can use external engines like Apache Lucene, Zend_Search_Lucene (PHP), etc


Another solution is to use a different search technology. Have a look at the Zend Search Lucene framework. This is based on the JAVA version of Lucene.

It has a much better way of searching content. You can search single words, phrases, ranges, , fuzzy, proximity, boosting, highlighting etc.

However your search content would need to be stored as flat files as well in the database and you would need to keep content in sync. However that said the power of the search is well worth it. Also it is fast, very fast.

Google: php lucene zend for more information.

Also see this for a great tutorial


As the other posters have mentioned normalise your database first.

To deal with spelling mistakes when searching, have a look at SOUNDS LIKE


you might want to give sphinx a try


Separate skills in a table just for that, and then use the Id to connect to your current table, some like this:

Skills: Id, Name
Skills_YourTable: Skills_Id, YouTable_Id
YouTable: Id, another fields.

EX:

Select y.fields 
from YourTable as y 
     join Skills_YourTable as sy 
     join Skills as s
 where s.Name = 'MS Office'
0

精彩评论

暂无评论...
验证码 换一张
取 消