I would like to perform an sql 开发者_如何学Gosearch and I would like to get best results. I tried some things but they didn't work well. I have got two columns named subject and content
For example we will search "search this keywords" text on subject and content area. First I'm searching "search this keywords" then searching "search" and "this" and "keywords"
I would like to retrieve subject's results on top and I would like to retrieve best results liking "search this keywords".My query sometimes works well sometimes not.
How should I write this query
Thanks..
I think you're saying that you want to perform multiple SQL queries against your database and then combine the results and set a "weighting" to a subject match over a content match.
select messageid, textstring, max(weight) from (
-- exact subject match
select messageid, substr(subject,1,100) textstring, 100 weight
from mytable
where subject='search this keywords'
union
-- partial subject match
select messageid, substr(subject,1,100), 90 weight
from mytable
where subject like '%search this keywords%'
union
select messageid, substr(subject,1,100), 80 weight
from mytable
where subject like '%search%'
union
select messageid, substr(subject,1,100), 80 weight
from mytable
where subject like '%this%'
union
select messageid, substr(subject,1,100), 80 weight
from mytable
where subject like '%keywords%'
union
-- partial content match
select messageid, substr(content,1,100), 70 weight
from mytable
where content like '%search this keywords%'
union
select messageid, substr(content,1,100), 60 weight
from mytable
where content like '%search%'
union
select messageid, substr(content,1,100), 60 weight
from mytable
where content like '%this%'
union
select messageid, substr(content,1,100), 60 weight
from mytable
where content like '%keywords%'
)
group by
messageid, textstring,
Try this
select * from (
Select sch, rank,
case when sch like '%search this keywords%' then 0
when sch like '%search%' then 1
when sch like '%this%' then 2
when sch like '%keywords%' then 3 end ord
from
(
select subject as sch, 1 as rank from mytable
union all
select content, 2 as rank from mytable
) as x
) as y
where ord is not null
order by rank, ord
Implementing a simple 'Full Text Search' like tables would be a way.
CREATE TABLE YourTable (id int, subject varchar(256), content varchar(8000))
CREATE TABLE Keywords (key_id int, keyw varchar(50), relevanceModifier float)
CREATE TABLE SubjectsKeywords (key_fk int, yourTable_fk int, quantity int)
CREATE TABLE ContentKeywords (key_fk int, yourTable_fk int, quantity int)
When you insert in YourTable, fire a trigger to:
Split subject and content columns by spaces, commas, etc into words.
Optionally, avoid "stop words" like "the", "they", "to", etc. This is called stemming if i'm not mistaken.
Each word should be inserted in tables SubjectsKeywords, ContentKeywords and Keywords.
Optionally, set relevanceModifier. A very simple criteria would be to use the string length.
Optionally, count each ocurrence and track it quantity fields.
Then your query would be like this:
select max(t.relevance), yourtable.id, MAX([subject]), MAX(content)
from
(
/* exact match and 'contains' match */
select 100 as relevance, id
from YourTable
where [subject] like '%search this keywords%'
UNION
/* keyword match */
select 70 as relevance, yt.id
from YourTable as yt
join SubjectsKeywords on id = yourTable_fk
join Keywords as k on k.id = key_fk
where keyw in ('search', 'this', 'keywords')
UNION
select 40 as relevance, id
from YourTable
where [subject] like '%search this keywords%'
UNION
select 10 as relevance, yt.id
from YourTable as yt
join ContentKeywords on yt.id = yourTable_fk
join Keywords as k on k.id = key_fk
where keyw in ('search', 'this', 'keywords')
) as T
join yourtable on t.id = yourtable.id
group by t.id
order by max(relevance) desc
, yourtable.id ASC /*So that the result will always be in the same order*/
Notes:
Trigger is a way to do it if you have little control of you application or if it is a maintenance nightmare.
Later it you could improve it by adding a soundex, so that, you can search even mispelled keywords.
RelevanceModifier, Quantity field can be use to calculate more relevant results.
As it may be fast enough, it may be usefull as an autocomplete feature for your application, in which case you'd like to limit the results to let say 256 at most.
I hope this gives you and idea, and so you decide what suits you best.
精彩评论