In one of my MySQL tables, I have following 开发者_高级运维columns:
Skills varchar(80)
Industry varchar(40)
Address varchar(100)
Skills might include text like: C/C++, MS Office, Linux etc. Industry might include text like: Finance, IT, etc. Address contains complete postal address along with city name. There is no separate city column.
On my web page I have Search boxes where in user can type keywords for each of the columns. I use SQL query as given below:
Select studentname where skills like '%...%';
I want to improve the search results as well as performance. For example, the user might have entered a skill like Linex instead of Linux. So, my search should return the approx. match.
Please let me know how I can make search more accurate and increase performance. At present, I have defined index on skills
, address
and industry
columns. But the performance is not quite well.
The table engine is InnoDB.
edit
The problem is that we receive pre-filled MS Excel sheets from various institutions. Our .NET application reads the column values of the Excel sheet and stores it in the remote database. For splitting tables, we need change in our application.
The SQL clause
like '%...%'
is the single most destructive thing you can do if you want performance from your database.
What you really should be doing is making sure that things like skills, industries and so on, are broken out into other tables with fixed values (like 'C', 'C++', 'SQL' and so on).
Then have a many-to-many table between person and skills. For example:
People:
PersonId primary key.
Other person details.
Skills:
SkillId primary key.
SkillName.
Other skill details.
PeopleSkills:
PersonId references People(PersonId).
SkillId references Skills(SkillId).
primary key (PersonId,SkillId).
index on (SkillId).
This sort of layout will both improve the speed of your queries massively and make incorrect data entry impossible if you only allow entry of search terms from the Skills table (no 'Linex' possible where you meant 'Linux', simply because 'Linex' isn't in the skills table).
The one unassailable rule I follow with table design is: if you're trying to extract a bit of information from within a column, that information should be put in its own column. The number of performance issues people suffer because they created tables with a single column holding comma-separated values (where they want to extract individual values from that column) should be testament to that.
The downside of having to ensure all skills and industries are in a separate table will be more than made up for by the increased speed and accuracy. Databases should be always designed for third normal form. They can be regressed to 2NF for performance reasons if you understand the consequences (and mitigate the possibility of incorrect data by using triggers or calculated columns) but this is rarely necessary.
I'd recommend you to use Fulltext search with indice. InnoDB doesn't support the fulltext, but you can use external engines like Apache Lucene, Zend_Search_Lucene (PHP), etc
Another solution is to use a different search technology. Have a look at the Zend Search Lucene framework. This is based on the JAVA version of Lucene.
It has a much better way of searching content. You can search single words, phrases, ranges, , fuzzy, proximity, boosting, highlighting etc.
However your search content would need to be stored as flat files as well in the database and you would need to keep content in sync. However that said the power of the search is well worth it. Also it is fast, very fast.
Google: php lucene zend for more information.
Also see this for a great tutorial
As the other posters have mentioned normalise your database first.
To deal with spelling mistakes when searching, have a look at SOUNDS LIKE
you might want to give sphinx a try
Separate skills in a table just for that, and then use the Id to connect to your current table, some like this:
Skills: Id, Name
Skills_YourTable: Skills_Id, YouTable_Id
YouTable: Id, another fields.
EX:
Select y.fields
from YourTable as y
join Skills_YourTable as sy
join Skills as s
where s.Name = 'MS Office'
精彩评论